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TECHNICAL PUBLICATION 


ESTIMATING COSMIC-RAY SPECTRAL PARAMETERS FROM SIMULATED DETECTOR 
RESPONSES WITH DETECTOR DESIGN IMPLICATIONS 

1. INTRODUCTION 


This Technical Publication (TP) develops statistical methods for estimating the three spectral 
parameters of the broken power law energy spectrum. Estimation of these parameters and quantification of 
the surrounding uncertainty of the estimates are of considerable importance to designers of cosmic-ray 
detectors. 

Analytical methods were developed in conjunction with a Monte Carlo simulation to explore the 
combination of the expected cosmic-ray environment with a generic space-based detector and its planned 
life cycle, allowing us to explore various detector features and their subsequent impact on estimating the 
spectral parameters. This study thereby permits instrument developers to make important trade studies in 
design parameters as a function of the science objectives, which is particularly important for space-based 
detectors where physical parameters, such as dimension and weight, impose rigorous practical limits to the 
design envelope. 

A simple power law model consisting of a single spectral index (oq) is believed to be an adequate 
description of the galactic cosmic-ray (GCR) proton flux at energies below 10 13 eV, with a hypothesized 
transition at knee energy (E k ) to a steeper spectral index a 2 >(X | above E k . Methods for estimating these 
three spectral parameters are developed in this TP. Because many of the features and analytical tools 
related to a simple power law have natural extensions to the analysis of this so-called broken power law, 
these methodologies will be discussed in detail first. 


2. SIMPLE POWER LAW 


The simple power law suggests that the number of protons detected above an energy (E) for an 
assumed collecting power (product of size and observing time) is given by:* 


( zr \ _a l + * 

W 0 (>E) = N^— J ■ (l) 

where E is in units TeV, a, is believed to be =2.8, and N A and E A are numbers determined from the detector 
size and exposure time in the environment, respectively. For a typical space-based detector of 1 m 2 with a 
3-yr program life, N A and E A are 160 and 500 TeV, respectively, implying that this detector is expected to 
observe 160 proton events above 500 TeV over its expected life cycle. In statistical terms, V 0 is assumed to 
represent an average number of events while the actual number to be observed on any given mission would 
follow the Poisson probability distribution with mean number N 0 . The number of particles detected is 
taken to depend only on the geometrical factor of the assumed detector and its material composition. The 
detection efficiency is a convolution of the geometry and material composition and is taken to be indepen- 
dent of energy. 

The associated cumulative probability distribution function (cdf) for E over some energy interval 
of interest [E^Ej] is then given by 
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Thus, the corresponding probability density function (pdf) for E is 
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for £] < E < £2 


(3) 


To randomly sample GCR proton event energies from the simple power spectrum over the interval 
\E\,E 2 \, i/ ( -O 0 (E ( ) is solved in terms of £, to obtain 


£,=Oo , («,) = 


-a. 


+ Uj(E : 


i— OTj 


-£, 


I-a, J l-a, 


(4) 


where u i is a simulated random number from a standard uniform distibution and O 0 _l represents the inverse 
function of <J> 0 , which is a conventional notation that will be used in subsequent sections. The mean of the 
simple power law distribution is determined by the expected value operator <E> which gives 
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The variance is given as o 2 E = <E 2 > - (<E>) 2 , where the general form of <E m > is 
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At this time, note the critical point that <E 2 > becomes infinite, as do all other higher moments, as 
E-, goes to infinity, as is easily seen in eq. (7): 


Iim f ft .v 2 x X dx = oo for all A < 3 and a>0 . (7) 
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This observation suggests the need for a careful look at the effects of the large variance and other 
higher moments associated with all power law distributions, even when £ 2 is kept finite. A measure of the 
relative dispersion of the energies of the incident protons, which is independent of units, is defined by 
V=< 7 E /p E for the simple power law and is called the coefficient of variation in the statistical literature. An 
important concept in detector design is the energy resolution p of the detector that provides a measure of 
the relative accuracy of a cosmic-ray detector, which is the fractional error in measurements of a 
monoenergetic beam. The resolution p is defined as the standard deviation divided by the mean response 
with typical values of 30 to 40 percent. 

As will be shown in this TP, the precision with which the spectral parameter a, can be estimated 
from a set of detector responses (energy deposits), measured in terms of its standard de viation, is a function 
of both the variance of the incident energies and the uncertainty induced by the detector. The dominating 
component of this measurement precision will be shown to be attributable to the variance of the incident 
energies <7 E , which in turn can only be controlled through collecting power. Since V and p are dimension- 
less and provide a measure of relative dispersion for the power law distribution and detector, respectively, 
an instructive comparison will show that V»p. To illustrate these points, a detector-life cycle having 
parameters N A =\60 and £ A =500 TeV will observe 52,200 events on average in the energy range £,=20 
TeV to £ 2 =5,500 TeV from a simple power law spectrum when a, is 2.8, which gives a mean GCR event 
energy p E =44.5 TeV, a standard deviation cr £ =74.10 TeV, and a coefficient of variation V= 1 66.5 percent. 
In comparison, the resolution p of most detectors is between 30 and 40 percent. £ 2 is chosen for this 
detector-life cycle combination as 5,500 TeV, since the expected number of events above this energy are 
negligible, while £, is taken to be 20 TeV for purposes of this discussion. 

Since the number of events and their incident energies will vary because of the finite detector size 
and exposure time, the statistical behavior of the GCR event energies in combination with a detector 
having energy resolution p and the subsequent spectral parameter estimates over multiple missions will be 
studied. Thus, for each mission, a random number N of GCR events from a Poisson distribution with mean 
52,200 to represent the number of simulated events that the detector will observe in the energy range 20 to 
5,500 TeV on any given mission will be generated. 

Next, the incident energy of each of these N events using eq. (4) is simulated. For example, for one 
such simulated mission, N= 51,883 and the mean and standard deviation of the simulated GCR incident 
energies are calculated to be 43.85 and 66.39 TeV, respectively. To illustrate the large fluctuations associ- 
ated with power law distributions, the same number of events (5 1 ,883) are simulated from a normal distri- 
bution having a mean of 44.5 and standard deviation 74. 1 so as to match the power law’s mean and standard 
deviation for this energy range when a, =2.8 and observe that the sample mean and standard deviation are 
44.51 and 74.17, respectively, for a single sample mission, which are much closer to the population mean 
and variance than those from the power law random samples. This process is repeated for 100 missions, 
and the standard deviation for each mission is plotted in Figure 1 . 

Note the large fluctuations of the standard deviations for the power law samples from mission to 
mission, while in contrast, the standard deviations of missions generated from a normal distribution are 
very stable. As will be seen in subsequent sections, this is why the variation in detector responses is domi- 
nated by the variation of GCR event energies, while the additional variation induced by the detector’s 
energy resolution plays a rather minor role. This in tum contributes the dominant component of the stan- 
dard deviation of the spectral parameter estimator. 
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Power Law Sigma 
Sigma (Normal Distribution) 



Figure 1 . Standard deviation of simulated incident energies from power 
law (ragged curve) for 100 missions compared with that from 
normal distribution having same mean and variance. 


The variation of the sample standard deviation s, measured by its standard deviation, is given by 


< 7 * = 


I tuzA 

4/j 2 N 


( 8 ) 


where /i r is the rth central moment about the mean, 2 defined for the simple power law as 


. (9) 

Thus, the large variation in mission standard deviations is due to the term fi 4 , which again is only 
finite by setting E 2 to a finite value, but nevertheless is responsible for the erratic behavior of the mission- 
to-mission sample standar d de viations as depicted in figure 1. This erratic behavior of the observed mis- 
sion standard deviations will necessarily be true for any power law having spectral index a t <5 . Note that 
for the normal distribution, 




a 

Tin ’ 


( 10 ) 


and evaluation of these two formulae yield <7=5 TeV for the simple power law and 0.229 for the normal 
distribution, which is roughly a factor of 22. 
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2.1 Estimation of the Spectral Parameter 


Of particular interest in the study of cosmic-rays is the estimation of the spectral parameter (X[ from 
a set of data. Even though in practice the actual incident GCR energies are never observed, but only 
a measure of their energy deposition from their passage through the detector, it is important to consider the 
concept of an ideal detector having zero resolution. Thus, such a detector would measure the GCR event 
energies exactly. 

2.1.1 Method of Moments 

The method of moments consists of equating the sample moments with the population moments, 
which in general leads to k simultaneous nonlinear algebraic equations in the k unknown population 
parameters. For the simple power law, there is only one parameter to be estimated, so the sample mean E 
is set to the population mean /j e in eq. (5) and then this nonlinear equation is solved in terms of d t , where 


a | - 2 




- E 


2-cq 


d-Cfi 


-E 


1 -«i 


( 11 ) 


Thus, for a given sample of size N, this equation is solved in terms of oc\ by numerical methods to 
provide an estimate of Ot \ . This estimator, which is a function of the random variable E , has its own 
associated pdf. Since the GCR incident energy E has mean jU^and finite variance 0 E ~ (only because the 
upper energy E 2 is finite), it is known by the Central Limit Theorem that the distribution of the sample 
average E follows a normal distribution with mean }i E and variance 0 / "V N . 

For example, when a,=2.8, £,=20 TeV, £ 2 =5,500 TeV, E is normally distributed with mean 
44.5 TeV and standard deviation (74.1 TeVyN 1 ':. These results can be used to obtain the probability 
distribution of the estimator by solving the probability equation: 


(a ,-r 

i 2 (X\ j— ’ 2 cx | 

\ E \ - E 2 

-44.5 


Ui- 2 , 

I 1^1 — &\ _ 77 1 — ^ | 

^ 7 

| 1 


74.1 
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J V2 n 
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— DO 


( 12 ) 


in terms of « 1 for various values of Z. Letting Z vary from -4.7 to 4.7 and setting (V=52,000 events gives 
the probability distribution of oc\ shown in figure 2. Also depicted in figure 2 is the relative frequency 
histogram of the estimates a, , based on 5,000 simulated missions; where for each mission, 52,000 events, 
on average, are simulated and the estimate of obtained by solving eq. (11). Furthermore, even though an 
explicit mathematical form for the pdf is not readily available, its mean and standard deviation can be 
calculated by numerical methods. For the distribution shown here, a numerically evaluation reveals its 
mean to be 2.800 and standard deviation as 0.0115 when V=52,000, which compares to the mean and 
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Theoretical Distribution ot 6! Versus 
Histogram of Estimates Based on 5,000 Missions 



Figure 2. Probability distribution of method of moments estimate of a j 

with relative frequency histogram of spectral parameter estimates 
obtained from simulation. 


standard deviation of the 5,000 simulated estimates with 2.800 and 0.01 14, respectively. With the ability to 
numerically construct this estimator’s pdf and moments, the important result is that its variance is inversely 
proportional to the sample size N, which is also true for many common estimators; e.g., the sample mean, 
standard deviation, and median. For example, if the number of events is doubled, then the variance is 
halved; and if the number of events is halved, then the variance doubles. Note that this relationship be- 
tween sample size and the standard deviation of the estimator oc\ is based on keeping £ | and £2 fixed, so 
that in practice, the variance can be reduced by increasing the size and/or observing time. 

2.1.2 Method of Maximum Likelihood 

The likelihood function of a random sample from the simple power law, regarded as a function of 
the single unknown parameter a,, is 


vIV 


£(«,) = 


,1 -a, 


V."l 


cq - 1 


- £ 
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£1 < Ej < E 2 


(13) 


The method of maximum likelihood (ML) seeks as the estimate of a ( that value (say, 0C ML ) which 
maximizes the likelihood function so that L(otj V j^ ) >£((X|) for all ot j . Statistically speaking, this means that 
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the ML estimator leads us to a choice of a, that maximizes the probability of obtaining the observed data. 
In practice, it is often simpler to work with the logarithm of the likelihood function and seek solutions of 
(log L )'= 0 for which (log L)" <0 (indicating a maximum), where the prime and double prime indicate the 
first and second derivative, respectively. Thus, eq. (14) is numerically solved in terms of a, to obtain the 
ML estimate 


aiogt N [ (log£| -(logE 2 )E'~"' 

da } cq-1 £, I_a| -E ] 2 ~ a ' 


N 

Xlog£,=0 • 


/=! 


(14) 


The second derivative of the log-likelihood function is obtained next. Note that (log L) <0 for all 
a,, indicating that log L is concave; hence, there is a unique maximum, which was graphically observed by 
plotting log L as a function of a t : 


5 21 °g L _ v 
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(15) 


By the Cramer-Rao inequality, the lower bound of the variance of any estimator a of a x is given 

by: 3 


Var(d) > 


-1 

3 2 log! 

da\ 


(16) 


which is asymptotically attained by the ML estimator. Also note that it is inversely proportional to the 
number of events N as was the variance of the estimator obtained using the method of moments. Other 
important properties of ML estimators are (1) asymptotically normally distributed and (2) consistency or 
asymptotically unbiased. Thus, a key question is, ‘"For what values of N are these asymptotic properties 
achieved by the ML procedure?” 

Based on the same 5,000 mission set discussed in the previous section, the mean and standard 
deviation of the 5,000 ML estimates are 2.800 and 0.00782, respectively. Using eqs. (15) and (16), the 
Cramer-Rao bound is computed to be 0.00786 when A=52,000 and 0^=2.800, which compares very well 
with the simulation results. Furthermore, the frequency histogram of these 5,000 ML estimates resembled 
the normal distribution as stated in (1) of the above paragraph. A separate simulation study was conducted 
in which the sample size N was gradually reduced from 52,000 to 200, and the two asymptotic properties 
(1) attaining the Cramer-Rao bound and (2) consistency, were achieved by the ML estimates until around 
N= 1,200. A bias on the high side of a ML and failure to attain the Cramer-Rao bound became more and 
more evident as the number of events N diminished from 1 ,200 to 200. 


Another very important comparison is the ratio of the standard deviation of a ML to that of the 
estimator obtained using the method of moments. Direct calculation shows this ratio is roughly 1 .45, 
implying that the ML procedure is significantly better than the method of moments when dealing with the 
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simple power law. This result is not too surprising, however, because ML estimators, in general, have 
better statistical properties than the estimators obtained by the method of moments. 4 

2.2 Detector Response Function 

An original goal of this research was to create a Monte Carlo simulation in which various detector 
response functions describing the distribution of energy deposition in the detector as a function of incident 
GCR proton energy could be inserted. This desired flexibility led us to seek a numerical solution instead of 
a completely analytical approach. 

Based on GEANT simulations of energy deposition for monoenergetic protons at specified ener- 
gies at 0. 1 , 1 , 1 0, 1 00, 1 ,000, and 5,000 TeV, the Gaussian distribution provided a reasonable description of 
the distribution of energy depositions at each of these incident energies. 5 Furthermore, the mean detector 
response was found to be well approximated by a linear function of incident energy in the range of interest 
for this study, which is typically between 10 and 5,500 TeV. Other detector response functions, such as a 
gamma distribution and another response function constructed from a combination of normal distributions 
having different parameters, have also been investigated and are presented in the broken power law section 
of this TP. 

The random variable Y is introduced to represent the detector’s response in terms of energy deposi- 
tion of a GCR proton of incident energy E, and the conditional mean response and standard deviation of Y 
for a given event energy E modeled as fJy\E =( a + bE) and o Y \ E = (c + <JE), respectively, where the four 
coefficients a, b, c, and d are estimated using linear regression on the GEANT simulation results. Thus, for 
each simulated incident GCR proton energy £), the detector response is simulated as 

Y i = Pri£, +<J Y\E, Z i ^ 


or 


Y i =(a + bE i ) + (c + dE i )Z i , 


(18) 


with the nonnegativity constraint y,>0 and where Z, is a standard normal random number having zero 
mean and unit standard deviation. Thus, the detector response function is defined as 


g(y I E) = 


% IE 

I e 

^2n<Jy\E 


2cr > 2 \E 


y >0 


(19) 


where t] x \ E is a normalizing coefficient related to the truncation of the normal distribution resulting from 
the constraint y>0. It is worth noting for constant resolution studies in which a Gaussian response function 
is assumed and p=<r/p is set to values 0.4 and 0.6, the corresponding detector energy resolution is 39 and 
51 percent, respectively, and is rounded to 40 and 50 percent in the figures and tables in this TP. 
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Thus, 7] | £ is determined from 


1 


oo 


( 20 ) 


%•!£ 


,2 



Pyl£ 


where the lower limit of integration is -1 divided by the resolution function given as 

Py\E = ®Y\E - P Y\E = 0 dE)/(a + bE ) . (21) 

First, it is worthwhile to consider a detector having energy resolution Py^^yiE ^Py\E a cons t ara P 
and independent of the cosmic-ray’s energy (E) so that g Y \e = P Py\E « where tyP ical values of interest for p 
are 0, 0.2, 0.3, 0.4, and 0.6. It should also be noted that the normalizing coefficient r) in eq. (20) is constant 
whenever the detector resolution p is energy independent. 

Second, a case where p m and °Vl£ are l' near but their ratio is not a constant so that the detector’s 
resolution is a nonlinear function of incident energy E was investigated. For this second scenario, two 
studies were conducted in which the resolution is getting better from 40-percent resolution at 20 TeV to 
30-percent resolution at 5,500 TeV and then getting worse from 30-percent resolution at 20 TeV to 40- 
percent resolution at 5,500 TeV. These two energy-dependent cases are presented in the broken power law 
section. 

For detectors having constant energy resolution p, rj is also a constant but depends on p. and is 
given in table 1 for several values of energy resolution. 


Table 1 . Normalizing coefficient r\ for Gaussian response function. 



Constant Resolution (p) 

10% 

20% 

30% 

40% 

50% 

60% 

Truncated 

0 

2.9E-07 

0.00043 

0.00621 

0.02275 

0.04779 

Probability 







T1 

1 

1 

1.00043 

1.00625 

1.02328 

1.05019 


2.3 Probability Distribution of the Detector Response 

The probability distribution for the detector response in the presence of the simple power law 
energy spectrum over the energy range [E],E 2 ] 
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( 22 ) 


go(y'’ a O= J g(y\E;p)(/) 0 (E;a l )dE , y> 0 . 

The spectral parameter a | has been explicitly included in the argument list of both the simple 
power law pdf as <p 0 (E:a { ) and the detector response distribution g 0 (y;a,) in eq. (22) to indicate that this 
spectral index is inherited through the integral. 

2.4 Ideal Detector 

The concept of a zero-resolution or ideal detector is very useful because it sets an upper bound on 
the expected performance of any real detector. Furthermore, it allows quantifying the magnitude of the 
uncertainty in the estimate of the spectral parameter, measured in terms of the standard deviation of the 
estimator, attributable to event statistics (statistical fluctuation of incident GCR proton energies) relative to 
the uncertainty in measuring the spectral parameter estimate induced by the detector’s nonzero energy 
resolution. 

Thus, for an ideal detector, p= 0 so that the standard deviation o Y \ E =0 for all GCR event energies E. 
Hence, the detector response to a GCR of energy E is given by Y=a+bE so that the incident energies may 
be directly obtained as E t -=(Xf-a)lb; therefore, the estimation procedures developed in sections 2.1.1 and 
2.1.2 apply. 

2.4.1 Method of Moments for a “Real” Detector 

The conditional expected value theorem, which says that the expected value of the conditional 
expected value is the unconditional expected value, 6 or in the notation of the mathematical expectation 
applied to the detector response Y, 


/Jy = <Y > = «Y I E » ’ 

to obtain the mean detector response p Y for a detector having constant resolution p: 


(23) 


p Y = (a + bp E )\ 


oo 

'+ pm _lin e 



(24) 


where p Y is the mean detector response (energy deposit) and p E is the mean of the simple power law 
distribution. The term involving the integral can be thought of as a correction term to the mean for the 
truncation given in table 1 and can be ignored whenever p<0.30; i.e., 30-percent resolution or better. Using 
the method of moments, p Y is estimated with the sample average F and when combined with eq. (5) for p £ , 
yields eq. (25) that can then be solved in terms of a, by numerical methods: 
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( 25 ) 



r « | -l^l 2 ~ A| - £ 2~" 1 . 
K a\ -2J £j~ a i -E ] 2 ~ a] 


For example, when the resolution is a constant 40 percent (p=0.40), the point estimate of the spec- 
tral parameter cq based on the 5,000 missions is 2.801 using eq. (25) and 2.79 using the same equation but 
with the correction term set to zero in the denominator, resulting in a bias of =0.01 that can be removed by 
including this correction term. This effect is much more pronounced when p= 0.60 and results in a bias of 
0.1 in the point estimate of oq so that the correction term is critical. 

When the detector response distribution is symmetric and truncation is negligible so that /J Y =(a+b^^), 
then a, can always be estimated using the mean of the detector responses Y to estimate fd Y ' n e< T (24). This 
implies that knowledge of the variance of the detector distribution, and hence the resolution, is really not 
required in order to estimate a,, provided knowing that the resolution is <30 percent so the effect of 
truncation can be ignored. 

This is a useful result, because if the uncertainty regarding the true resolution is non-negligible, 
then the method of moments provides a way to proceed with the estimation of a,; e.g., the detector’s 
energy resolution is known to be <30 percent but nothing more. However, as already noted, the method of 
moments does not provide the minimum variance estimator that the ML method does which requires a 
complete specification of the detector parameters a, b, c, and d of this assumed Gaussian response func- 
tion. Furthermore, the energy resolution of most real detectors is worse than 30 percent. 

This estimator based on the method of moments is a function of the random variable Y and has its 
own associated pdf. Since Thas mean p y and variance d 2 Y , it is known by the Central Limit Theorem that 
the distribution of F follows a normal distribution with mean fi Y and variance C 2 Y /N. Thus, the variance of 
the detector response Y is cs Y = < T 2 > -|i Y > where 


< Y l > = (o z + labn E + b 2 o\ + b A p|;)n(p)| 


2. .2 


2 _ 2 _ 


f e 2 dx 

VT7r 


(26) 


For example, when a, =2.8, £,=20 TeV, £ 2 =5,500 TeV, and p=0.40, Y is normally distributed with 
mean 131.58 GeV and standard deviation (213.69 GeV)/N V 2 . The probability distribution of &\, along 
with its mean and standard deviation, can be obtained by solving the probability equation in eq. (27) using 
the methods discussed with eq. (12): 
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If the truncation effect is assumed to be negligible in eq. (26), then the following succinct formula 
for the variance of the detector response as a function of detector parameters a, b, and p and the mean jJ E 
and variance CS E of the power law distribution is obtained: 

a 2 = b 2 a 2 E + p 2 [(a + bp E ) 2 + b 2 o E J ■ 

In terms of the standard deviation of the detector response < 7 V , the approximation in eq. (28) is seen 
to be actually quite good, for when p= 0.40, this formula yields, <r y =213.37 GeV as compared to the exact 
value of 213.69 GeV obtained from eq. (27) using the integral correction terms. When p- 0.60, this 
approximation yields <T y =237.3 1 GeV as compared to the actual value of 239.78 GeV Thus, ignoring the 
truncation is not too serious when estimating the standard deviation but can be devastating for p>0.40 
when estimating the mean p Y and hence a 1 when using the method of moments. Much insight into th e 
estimation of the spectral parameter a { can be gleaned from eq. (28) because it shows the relationship 
between the variance o Y of the detector response distribution, the variance c E of the GCR proton energy 
spectrum, and the detector response function parameters a , b, and p. 

The influence of the variance and other higher moments of the simple power law energy spectrum 
is visualized in figure 3 which shows the mean detector response (mean energy deposit) per mission for 30 
simulated missions in comparison with the mean incident proton energy for 30 missions. Cor responding 
standard deviations per mission are plotted in figure 4. Note that the detector response mean and standard 
deviation per mission tend to track the mean and standard deviation of the incident energies for the 30 
missions, illustrating the strong influence of the GCR energy mission-to-mission fluctuations on the detec- 
tor response variation, even in the presence of the “smearing” induced by this detector having 40-percent 
energy resolution. As will be seen in section 2.4.2, the component of variation due to the GCR event 
statistics will be the dominating component of the total variation in the standard deviation of the estimator 
of the spectral index a, . 
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— #— Mean Detector Response (40% Resolution) 



Figure 3. Comparing the mean incident energy with the mean 
of the detector responses for each of 30 missions. 


— Power Law Sigma 



Figure 4. Comparing the standard deviation of the GCR incident energies 
with the standard deviation of the detector responses for each 
of 30 missions. 


2.4.2 Maximum Likelihood for a “Real” Detector 

As in section 2. 1 .2, the method of ML seeks which maximizes the log-likelihood function so 
that log L(a ML )> log L(a{) for all ctj where the likelihood function for the detector response in the 
presence of the simple power law energy spectrum of N incident GCR protons over the energy range 
[E\,E 2 ] is 
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(29) 


N N 

log L(a 1 ) = ]T logk'o (>’/•; «i )1 = X lo § 
7=1 ' 7=1 


j 




#(>’; l£)^> 0 (E;«| )dE 


Because of the complexity of the integral and the desired capability to easily change the functional form of 
the detector response function g in eq. (29), a numerical approach for obtaining a ML was chosen. Two 
optimization algorithms that do not require gradient information (derivatives) were selected for use; i.e., 
the multidimensional minimization algorithm called the Nelder-Mead downhill simplex method and Powell’s 
direction set method. 7 While both methods provided matching results and were about the same in terms of 
computer run time, the Nelder-Mead downhill simplex method was easier to control and modify the termi- 
nation criteria. Furthermore, the simplex method proved to be more robust with the emergence of multiple 
maxima in the likelihood function which occurred at the higher values of the knee location investigated in 
the broken power law section of this TP. Therefore, the discussion that follows is specific to the downhill 
simplex method. Since this is a minimization algorithm, the objective function is defined as 


N 

0(a, ) = - log L(a, ) = - J log 

7=1 


/ 


E\ 


g(yj \E)(t) 0 {E ;a l )dE , 


(30) 


so that minimizing 0(a\) maximizes log L(CC|) as desired, where the integral is numerically evaluated. The 
following two termination criteria are used to halt the search procedure for the ML estimate at the (/»+l) ,h 
iteration: 


(i) 
and 

(ii) \0(a l m+i )-O(a l m )\<£ 2 . . ... , (31) 

’ :r- } ■ ■ ■' p 1 . ' 

The search procedure continues until the termination criteria are met, which in words are: (i) the 
movement in successive step sizes of oq is <£j and (ii) the objective function is changing by an amount <e 2 . 
Typical values used for these two stopping tolerances are on the order of 10 _<s and seem reasonable in light 
of the magnitude of the parameter being estimated (=2.8) and the value of the objective function in the 
vicinity of the ML solution, 0(a ML ) being of the order of magnitude 10 5 when E, is taken to be anywhere 
between 10 and 30 TeV, so the number of terms in the sum is between 182,000 to 26,000, respectively. 
Furthermore, changing f] and/or e 2 ' n cither direction by an order of magnitude provided no noticeable 
change in results. 

Figure 5 shows the ML estimates of a ] for a zero-percent resolution detector obtained from eq. ( 1 4) 
in comparison with the ML estimates obtained from a 40-percent resolution detector and applying the 
downhill simplex algorithm to eq. (30) for 30 missions. This very close comparison suggests that the GCR 
event statistics are the dominating component of uncertainty in the estimation of the spectral parameter otj. 
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a, (Maximum Likelihood, 0% Resolution) 



Figure 5. Maximum likelihood estimates for zero- and 40-percent 
resolution detector for 30 missions. 


2.5 Summary Remarks and Conclusions for the Simple Power Law 

Two methods for estimating the single spectral index (a ( ) of a simple power law have been inves- 
tigated. The first method — the method of moments — was found to be very useful in studying the general 
nature of the statistical estimation problem as well as yielding an analytical solution that could be com- 
pared with Monte Carlo simulation results. Furthermore, when the detector resolution is better than 
30 percent so that the truncation of the detector response function is negligible, the method of moments 
provides an estimator of a,\ without requiring specific knowledge of the detector resolution p but only that 
it is better than 30 percent. This does not imply p is insignificant when it is <30 percent, but only that the 
correction terms previously discussed can be ignored and thus explicit knowledge is not needed of the 
value of p to estimate a v In fact, the standard deviation of the estimator increases as p increases as one 
would expect and results from the fact that whatever phappens to be, its impact is communicated to the 
estimate of a, through the variance of the detector mean response Y which is a function of p as indicated 
in eqs. (26)-(28). 

Another interesting result is that when the resolution is <30 percent, it is not necessary to know the 
explicit functional form of the detector model, but only that it is symmetric. Unfortunately, most detector 
response functions are worse than 30-percent resolution and may be asymmetric as well. 

The method of ML estimation clearly stands out as the method of choice for estimating a, in terms 
of minimum variance and consistency (asymptotically unbiased), as well as asymptotic normality which 
allows for probabilistic statements, such as confidence intervals for the unknown spectral parameter. These 
results as a function of detector resolution are shown in figure 6. 
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Maximum Likelihood and Method of Moments 
Estimator of Spectral Prameter a 1 
(Simple Power Law) Versus Detector Resolution 


■ Method of Moments (Simulation) 



Figure 6. Comparison between method of moments and maximum 
likelihood as a function of detector resolution. 


When compared to the standard deviation of the method of moments estimator, the ratio varies 
from 1.47 for the zero-percent resolution detector to 1 .33 for the 50-percent resolution detector, which is 
roughly equivalent to giving away half of the detector’s collecting power by choosing the inferior method 
of moments estimation technique. 

Also shown was that the standard deviation of the estimate for both estimation procedures is 
inversely proportional to the square root of the sample size, so that halving the collecting power increases 
the standard deviation by a factor of yfl . This holds true for the standard deviation of ML estimate as long 
as it attains the Cramer-Rao lower bound, which it does when the number of GCR events exceeds = 1 ,200. 

Another important result is the relationship between the collecting power and the energy resolution 
of the detector. A measure of the detector’s ability to estimate the spectral parameter is its standard 
deviation and as seen in figures 6 and 7, the dominant component of the standard deviation of 0^ is 
attributable directly to the large fluctuations in GCR incident energies, being driven by the large variance 
and other higher moments of the simple power law distribution. This large component can only be reduced 
by increasing the number of events N that is controlled by the collection power of the dectector. A compari- 
son of the standard deviation of a ML for the generic detector discussed in this TP and when its collecting 
power is halved is given in figure 7. Table 2 provides the numerical results used to construct many of the 
figures in this section. 
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Geometry Factor Comparison: Half as Many Events 

— - Maximum Likelihood (Simulation) 

B Maximum Likelihood (A/=26,000) 



Detector Resolution (%) 


Figure 7. Comparing the effect of collecting power on the standard 
deviation of the maximum likelihood estimate of the 
spectral index Ct| . 


Table 2. Numerical values used to construct figures 6 and 7. 


£ 1= 20TeV, E 2 =5,500TeV, a 1 =2.8, /V ave rage=52,000 
events. 5,000 mission averages for simulation results. 

Detector Resolution 

0% 

20% 

40% 

50% 

1 . Method of moments (theory) 

0.0115 

0.0116 

0.0128 1 

0.0136 

2. Method of moments (simulation) 

0.0114 

0.0117 

0.0125 , 

0.0133 

3. Maximum likelihood (Cramer-Rao lower bound) 

0.00786 

Analytica 

solution not available 

4. Maximum likelihood (simulation) 

0.0078 

0.0083 

0.0092 

0.0100 

5. Mean detector response (GeV) (theory) 

130.66 

130.66 

131.58 

138.85 

6. Mean detector response (GeV) (simulation) 

130.66 

130.64 

130.64 

138.81 

7. Standard deviation (theory) 

192.07 

197.61 

213.69 

239.77 

8. Standard deviation (simulation) 

191.47 

196.86 

213.33 

238.82 

9. Coefficient of variation py (detector, %) 

147 

151 

162 

173 

F 1 =20TeV, £2=5,500 TeV, a,=2.8, A/ av eraae=26,000 





events. 5,000 mission averages for simulation results. 





10. Maximum likelihood 

0.0110 

0.0118 

0.0132 

0.0144 

11. Ratio of line 4 to line 10, compare to sqrt(2) 

1.41 

1.42 

1.43 

1.44 



3. BROKEN POWER LAW 


This energy spectrum suggests a transition from spectral index ot| below the knee location energy 
Ej. to a steeper spectral index cCjXX^ above the knee. The broken power law predicts that the number of 
protons detected above an energy E is given by: 1 


AT 


N ,(> E) = 


a, 


i7 v a i + V E V«2 +1 


a 2 ~ 1 


Ek 
\ e aj 


\ e aj 


for E > Ej, 


(32) 


N 0 (> E) - [N 0 (> E k ) - iV, (> E k )] for E < E k 


where E is in units TeV, N A and E A are 160 and 500 TeV as before, and currently available measurements 
suggest that eq is =2.8, a ' 2 is thought to be somewhere between 3.1 and 3.3, and E k is parameterized in the 
range 100-300 TeV for this research. N 0 i>E) is the number of protons detected above an energy E as 
defined in eq. (1); and as in the simple power law section, these simulation studies assume the number of 
events for a given mission follow the Poisson probability distribution with mean determined by eq. (32). 
Writing N 0 (>E) in eq. (32) as 


Nq(> E) = N a 


'El 

k e Aj 


-a { + i 


\ E kj 


-a i+l 


(33) 


and constructing the cdf as in eq. (2), then differentiating, gives the pdf of the broken power law over 
energy range [E { ,E 2 ] as 


(j) l (E;a h a 2 ,E k ) = 


( E 


\ E kJ 
( E ^ 


V E k ) 


for 

El 

V 

VI 

for 

h 

- E - e 2 


where the normalizing coefficient A is given by 


A - A(a ] ,a2, E k) = ' 


(«i -1)(cr 2 -l) 


a, -or 2 +(a 2 -l)l ■ ~^r 


1— or. 


-(«! - 1) 


V E k) 


(34) 


(35) 
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Note that <p l has “slope” a^/On below/above the knee location E k and is continuous at E k as 
required, and the single normalizing coefficient A in both mathematical terms of (p\ in eq. (34) provides a 
succinct mathematical form, making calculation of the log-likelihood function in the ML search algorithm 
computationally more efficient than other equivalent mathematical representations of <j>^ . The mean, vari- 
ance, and other important moments of the broken power law distribution can be obtained from the general 
form of <E m > given as 


E 2 

<E m >= J E m <f>\(E)dE 


= AEj? +l 


1 

1- 

f £ ‘l 


1 

1- 

(%] 


> 

m + 1 — Cf| 

y E kj 


m + 1 - a 2 





(36) 


which necessarily has dimension (TeV) w since A has dimension (TeV) -1 . A random sample of GCR proton 
event energies are obtained from the broken power law spectrum over the range \E^,E^\ as E- — 0| (m ; ), 

where m, is a random number from a standard uniform distibution and O, -1 represents the inverse function 
of the broken power law cdf . 


Figure 8 shows ZV,(>E) with a histogram (the ragged curve in fig. 8) constructed from simulated 
events from the broken power law'. /Vq(>£) is included in figure 8 for comparison with N ](>£) and clearly 
shows the transition from ct j to ocj at the knee E k , with the plots cropped at 1,000 TeV to better illustrate 
this so-called knee region. Parameters used in this example are a i = 2.8, 02=3.3, £^=100 TeV, £i=20 TeV, 
and £ 2 =5,500 TeV. 

Note that at the knee, the difference between V 0 (>100 TeV) and ^,(>100 TeV) is 626 events and 
reduces to 412 events when o 2 drops to 3.1. Another important observation is the significant reduction in 
the standard deviation of the incident energy when compared to the simple power law, which suggests that 
detector resolution will play a somewhat larger role in the overall contribution to the estimator’s standard 
deviation than it did in the case of a simple power law. The mean / i E , standard deviation ct £ , and coefficient 
of variation V=o E !(i E are given in table 3 for selected parameters for comparison. 
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Broken Power Law N^>E) and Simulated #,(>£) 
With Simple Power Law Nb(>£), 
oc 1= 2.8, 02=3.3, £,=20 TeV 



Figure 8. Comparison of N\(>E) with N 0 (>E). A histogram of 
simulated events from the broken power law are also 
included. 


Table 3. Means, standard deviations, and coefficient of variation (mathematically 
the same as resolution) for the simple power law and broken power law. 


Energy Range 
20-5,500 TeV 

Spectral Parameters 

Mean 

(TeV) 

Standard 
Deviation (TeV) 

Coefficient 
of Variation (%) 

Simple power law 

cci= 2.8 

44.50 

74.1 

166 

Broken power law 

ai=2. 8 , c( 2 = 3 . 1 , £*=100 TeV 

41.83 

54.17 

129 

Broken power law 

011 = 2 . 8 , a 2 = 3 . 3 , £*=100 TeV 

40.67 

45.54 

101 



3.1 Estimation of the Spectral Parameters a k , o^, and E k 

As suggested in the simple power law study in section 2, the ML procedure offers a superior 
approach for estimating the spectral parameters in terms of their known favorable statistical properties. 
Thus, concentration will be on obtaining the ML estimates of the three spectral indices of the broken power 
law distribution. For notational convenience, the vector 0=(a,. «>, E k ) consisting of the three broken 
power law spectral indices is introduced. 

The ML estimation procedure will be illustrated for a single mission by first estimating ©directly 
from the incident energies £, (equivalent to the so-called ideal detector having zero energy resolution;, and 
then from their simulated detector responses 7 ( using the same detector response function described in the 
simple power law section and for the case where cq= 2.8, oc 2 =3.3, £^=100 TeV, £ ( =20 TeV, and £2=5,500 
TeV. The results from many other parametric scenarios of interest will also be presented. 

3.1.1 Method of Maximum Likelihood for the Ideal Detector 

The likelihood function of a random sample of size N from the broken power law, regarded as a 
function of the unknown vector of parameters 0=(aj, 0 ^, E k ) is 


\ 


-a. 


£(0) = 4(0) 


N 


n f- 

. £ ,<£i £ *. 


f \ 

n §■ 

V £ J 2£ < ‘ , 


-« 2 


, £, < £,, Ej < E 2 , 


(37) 


where the first product is over the energies below the knee energy (E k ) and the second product is over those 
energies above E k , and they total in number to N, and A(0) is the coefficient given in eq. (35). The Nelder- 
Mead downhill simplex method is used to find the ML solution 0 ML that minimizes the objective function 
(minus the log-likelihood) defined as 


0(0) = -£(0) = -N log A(0) + a, 
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X lo § 
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F, 
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+ a 2 
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X l °g 

e j 

F, 


K E i <E k 

l L k] 



l^k J 
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(38) 


For the sample mission under consideration, the number of simulated events is N= 5 1 ,259 and is a 
random number generated from a Poisson distribution with mean )V|(>20 TeV)=5 1 ,576 (recall Aq= 52,200 
for the simple power law). Note that 2, 1 65 of these events are above the assumed knee location at 1 00 TeV. 
Also, the mean of these simulated incident energies is 40.28 TeV and standard deviation 40.79 TeV and can 
be compared with the bottom row of table 3. 

To obtain a reasonable starting point for the search procedure, it is first assumed that a, will be 
largely influenced by those energies (£,) thought to be well below the knee energy ( E k ), even though the 
true value of E k is unknown. For example, if all energies below 70 TeV (of which 49,094 are below 70 TeV, 
or 96 percent), with the assumption that a simple power law will dominate the statistical description of 
these event energies, then the ML estimate of a l is 2.81 using eq. (30). Next, keeping fixed at 2.81 
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and using the full set of simulated event energies, the other two parameters are fit using a two-dimensional 
simplex search for (c^E*), which yields a 2 = 3.3 1 7 and £ A =95.44TeV. The two-dimensional simplex search 
is illustrated in figure 9 and three things should be noted: (1) The knee energy (E k ) has been scaled by a 
factor of 0.1 so that it is fairly close in magnitude to the other two spectral parameters, (2) the simplex can 
leave the initial simplex region (but in this example, it returned), and (3) the simplex moves only one 
vertex per iteration. 



Figure 9. Two-dimensional simplex search for ( a 2 ,E k ). 


Next, 0 init i a ] =(2.8 1 0, 3.3 1 7, 95.44) is defined and used to construct the initial simplex for the three- 
dimensional search for 0 ML , where this simplex consists of the vertices of a tetrahedron centered at 0 init j a | 
with edge lengths in each coordinate axis taken to be 20 percent of each component of 0 ini(ia |. For the two- 
and three-dimensional searches, slightly different termination criteria are used and the relative difference 
in magnitudes of the three spectral parameters are considered. The search halts when (1) the maximum of 
the greatest relative distance of each of the three spectral parameters is each smaller than e x and (2) the 
maximum change in the objective function over each of the four vertices is <e 2 , so the simplex essentially 
shrinks to a very small, nonmoving tetrahedron at ©ml. Setting the e’s to the values discussed in the simple 
power law section, 0 ML =(2.801, 3.324, 94.95) is obtained. At this ML solution, note 2,434 of the 51,259 
simulated GCR energies are above the estimated knee location at 94.95 TeV, whereas only 2,165 are above 
the “true” location at 100 TeV. 

Also note that the two-stage approach for constructing a suitable initial simplex for the three- 
dimensional search produced in this example values of 0 initia | that are quite close to 0 ML , which of course 
is very desirable. However, in subsequent studies where the true knee location (E k ) is set to higher values 
such as 300 TeV, it was necessary to introduce a more sophisticated search because of the situation of 
multiple minima arising from the erosion of the asymptotic properties of the likelihood function as the 
number of events above the knee diminished. 
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Figure 10 shows a stereoscopic pair of the initial simplex tetrahedron and the first few steps, where 
only one vertex is moved per iteration, a,, c^, and E k are along the .xyz axis. The dot in the center is the 
tetrahedron at termination, and 0 ML is obtained from the coordinates of the last step upon halting. Dimen- 
sions have been scaled according to the termination criteria and also to facilitate viewing. 



Figure 10. Stereoscopic view of the first few movements of the 

Nelder-Mead downhill simplex search (cross-eyed stereo). 


As a check on the found solution, a coordinate frame is centered at 0 ML and then the objective 
function evaluated along each axis by an amount of ±10 percent of each value to measure the behavior of 
0(0) in the vicinity of 0 Ml The results are depicted in figure 1 1 and show that 0(0) is indeed a minimum 
at 0 ML . Note that variation in a, produces the greatest variation in the objective function, as one would 
expect, since it is a coefficient of 48,825 (95.2 percent) of the event energies below the estimated knee 
location at 94.95 TeV. 


Objective Function Around the Maximum Likelihood Solution 



80 . 984.5 
80,984 

80 . 983.5 
80,983 

80 . 982.5 
80,982 

80 . 981.5 
80,981 

80 . 980.5 
80,980 

80 . 979.5 
80,979 


Figure 1 1 . Objective function in the vicinity of the maximum likelihood solution 0 ML . 
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3.2 Estimation of the Spectral Indices With a “Real” Detector 

For each simulated GCR event energy £, from the broken power law spectrum, there is an associ- 
ated simulated detector response Y i according to the detector response function defined in eq. ( 1 9). The pdf 
of the detector response in the presence of the broken power law spectrum is thus given by 


E 2 

(y; , 0^2’ E /- ) = J g(y\E\p) (j)\{E \&\,(X 2 ’Ef i )dE, y>0, (39) 

E \ 

where the integral limits [£,,E 2 ] must be split as [E,,£J and [£ A ,£ 2 ] in the numerical integration. Figure 12 
depicts this pdf for several different values of the detector energy resolution (p). 


Broken Power Law, a 1 =2.8, a, =3.3, E k =100 TeV 
E, =20 TeV, £2=5,500 TeV 



Figure 12. Detector response probability density function for resolutions 
1 0, 20, 40, and 50 percent. 


Detector responses Yj for a detector with constant resolution p=0.40 are simulated and all other 
detector response function parameters are defined in the simple power law case, using the same set of 
51,259 incident energies £, from the broken power law spectrum considered in the zero-resolution case. 
The mean is calculated as 120.37 GeV and the standard deviation 123.99 GeV. Figure 13 compares prob- 
ability curves (greater than) on a log-log scale for the detector response distributions in the presence of the 
broken power law 0, and the simple power law 0 O . A log-log scale helps illustrate the difference between 
detector response distributions to the two different GCR energy spectra 0 q and 0] . A frequency histogram 
of the simulated detector responses to a broken power law is also provided in figure 13 (lower curves), 
although it is virtually indistinguishable from the theoretical function. 
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Detector Response Probability Distribution F\Y>/} to Broken 
Power Law <{»i, and Simple Power Law <|> 0 
ai =2.8, a 2 =3.3, F^IOOTeV, f 1= 20TeV, £,=5,500 TeV 



Figure 1 3. Detector response distributions Pr(Y>y) in the presence of 
a simple power law and and broken power law. Histogram 
of simulated responses to broken power law is also 
included. 


The simplex procedure is used to obtain the ML estimates 0 ML for the three spectral indices that 
minimize the objective function (minus the log-likelihood): 


N E 2 
0(Q) = -^Log | 
r=l [_£, 


g{yj\E\p)^i(E;B)dE . 


(40) 


Selection of a starting point for the three-dimensional search follows along similar lines to the zero- 
resolution energy case, but here only the detector responses T f are used. Again, assume the estimate of (X\ 
will be largely influenced by those detector responses Y { thought to be below the detector’s mean response 
to some GCR event believed to be below the knee E k , e.g., a 70-TeV event. Thus, a simple power law is fit 
to those detector responses below 196.76 GeV (mean response to a 70 TeV GCR proton and accounts for 
89 percent of all the detector responses in this simulated set), with the assumption that a simple power law 
will dominate the statistical description of these events. It is important to note that the present goal is to 
obtain a reasonable starting value of dfj . Even though some detector responses to incident energies below 
E k will end up above the detector mean response to E k and visa versa, the set of response energies below 
the mean detector response to a 70-TeV event given by jit(70TeV)= 196.76 GeV should be well represented 
by a simple power law. Thus, the conditional pdf g 0 0>«i I y< 196.76 GeV) is used and its associated 
objective function minimized in terms of ct\ to obtain 2.8. In practical terms, the front end of the detector 
response pdf g] is approximated with the detector response pdf g 0 associated with a simple power law. 
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Figure 14 shows the fitted detector response distribution 1-G 0 ( Y I Y< 196.76 GeV) with the detec- 
tor response histogram in the presence of the broken power law. Their difference is provided since the two 
curves are visually on top of each other. 


Starting Value of a, 
1-G 0 (y) Versus Histogram 



Figure 14. Approximating the front end of G ] with G 0 
(cumulative detector response distribution to 
simple power law). 


Holding a, fixed at 2.8 and using the full set of detector responses, a two-dimensional search for 
{ 02 , E /) yields a 2 =3.32 and E k = 96.8 TeV. Figure 15 shows the fitted distribution making the transition 
along the two parts of the broken power law distribution joined at the knee E k , and tracks the histogram of 
simulated detector responses. A simple power law response distribution given by Pr(F>y)=l-G 0 (>’) * s pro- 
vided for comparison. As before, a tetrahedron about 0 initial = (2.80, 3.32, 96.8) provides the initial simplex 
and then a three-dimensional search using all the detector responses yields 0 ML =(2.81, 3.38, 102.9). 

To check the ML solution, a coordinate frame is centered at 0 ML and the objective function evalu- 
ated along each axis by an amount of ±10 percent of each value to measure the behavior of 0(0) in the 
vicinity of 0^. The results are depicted in figure 16 and indicate that the objective function is indeed a 
minimum at 0 ml- A slightly more rigorous check was also performed in which the objective function was 
evaluated at each point of a random cloud consisting of 1,000 points surrounding 0 ML and for which 
0(0 ML ) was observed to be the smallest. 
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Objective Function (o^ , a,) Probability P( Y>y) 



Objective Function (£*) 







4. RESULTS 


Methods for obtaining the ML estimates of the three spectral parameters of the broken power law 
distribution from simulated detector responses have been developed, thereby enabling us to study various 
calorimeter design parameters and their impacts on the statistical properties of these ML estimates. The 
following studies are of particular interest and are included: (1) Statistical properties of the ML estimates 
and variation of the knee location and spectral break size, (2) data analysis range, (3) energy-dependent 
resolution, (4) non-Gaussian detector response functions, (5) collecting power versus energy resolution, 
and (6) implications of detector response model uncertainties. 

4.1 Statistical Properties of the Maximum Likelihood Estimates 
and Variation of the Knee Location and Spectral Break Size 

In this section, the statistical behavior of the ML estimates of the three spectral parameters based on 
simulating many missions is explored. Figure 17 shows relative frequency histograms of these estimates 
based on 1 ,000 simulated missions in which the spectral parameters were set to a, =2.8, a 2 =3.3, and E k = 1 00 
TeV for the data analysis range 20-5,500 TeV and a detector having a Gaussian response function with 
40-percent constant energy resolution. Note that the histograms are roughly Gaussian in shape but with a 
slight skewness to the right, exhibited for a 2 and E k but not a,. 


Relative Frequency Histogram of a 1 and a 2 

1,000 Missions, Broken Power Law With or 1 =2.8, 
a 2 =3.3, £*=100 TeV, and Incident Energy 20-5,500 TeV 

Detector Resolution 40% 



Relative Frequency Histogram of £* 

1,000 Missions, Broken Power Law With a 1 =2.8, 
a 2 =3.3, £*=100 TeV, and Incident Energy 20-5,500 TeV 

Detector Resolution 40% 



Energy (TeV) 


Figure 17. Relative frequency histograms of the maximum likelihood estimates of the spectral 
parameters a v a 2 , and E k of the broken power law energy spectrum. 
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These observations lead to a more general investigation of the asymptotic behavior of ML esti- 
mates. Table 4 provides a summary of these findings. The first column lists the Gaussian response function 
resolution (zero and 40 percent) for the studies presented in table 4, and the second column gives the 
average number of events above E\= 20 TeV used in each simulated mission, along with the average num- 
ber of events above the knee location E k given in parentheses for values of E A =100, 200, and 300 TeV. For 
example, the entry 51,576 (2,255) appearing in the first row indicates there are 51,576 events on average 
above 20 TeV for the baseline detector of which 2,255 of them would be above the knee location, £)=100 
TeV. The next three columns give the mean of each spectral parameter based on the simulation results, 
followed by the last three columns that give their respective standard deviations. The rows labeled as 
“Theoretical Limits” provide the input parameters for these simulation studies along with the Cramer-Rao 
bound which is the bound below which the variance of an estimator cannot fall 2 and is thus very important 
when comparing different estimation techniques. 

First, note in table 4 that as the true knee location ( E k ) is set at 100, 200, and 300 TeV in the 
simulations, an ever-increasing amount of bias is observed in the mean estimate of an< ^ E k d ue to the 
erosion of consistency (asymptotically unbiased) and is a direct consequence of the diminishing number of 
events above the knee, whereas the ML estimate of ct| continues to enjoy this favorable statistical property. 

The Cramer-Rao lower bound is provided for comparison with the standard deviations of the ML 
estimates obtained from the simulations. Note that while this theoretical minimum variance bound is nearly 
attained when the true knee location (E k ) is 100 TeV and the number of events above E k is over 2,000, the 
ability to achieve this lower bound gradually declines as the true knee location E k increases to 200 TeV, and 
then even more so when E k =300 TeV. The gradual growth in bias and inability to achieve the Cramer-Rao 
lower bound, coupled with a growing skewness in the frequency histograms of the estimates for and E k 
that indicate the asymptotic normality property is slipping away too, are symptoms of the increasing diffi- 
culty in estimating the spectral parameters when the true knee location ( E k ) is too high for this baseline 
detector. Furthermore, an investigation of the behavior of the objective function defined in eqs. (38) and 
(40) shows the emergence of multiple minima at these higher values of E k and is a condition that is 
observed to worsen with increasing E k . 
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Table 4. Asymptotic behavior of the maximum likelihood estimates for E k = 100, 200, 300 
TeV, collecting power IX (baseline) and 5X, with a special 6.4X detector only for 
the £^=300 TeV case. 


Detector 

Resolution 

{%) 

f*(TeV) 
N 1 (>20 TeV) 
(Ni (>F*)) 

Mean 

Standard Deviation 


«i 

a 2 

E k 

“i 


«2 

F* 


0 

100 

2.80 

3.30 

100 

0.012 


0.049 

6.6 

IX Theoretical Limit 


51,576 








(unbiased, Cramer-Rao LB) 

0 

(2,255) 

2.80 

3.31 

101 

0.012 


0.053 

7.6 

Simulation (3,000 missions) 

40 


2.80 

3.31 

102 

0.020 


0.076 

14.0 

Simulation (3,000 missions) 

0 

100 

2.80 

3.30 

100 

0.0052 

0.022 

3.0 

5X Theoretical Limit 


257,880 








(unbiased, Cramer-Rao LB) 

0 

(11,275) 

2.80 

3.30 

100 

0.0052 

0.022 

3.2 

Simulation (3,000 missions) 

40 


2.80 

3.30 

101 

0.0088 

0.033 

6.2 

Simulation (3,000 missions) 

0 

200 

2.80 

3.30 

200 

0.0094 

0.092 

23.4 

IX Theoretical Limit 


52,022 








(unbiased, Cramer-Rao LB) 

0 

(647) 

2.80 

3.32 

202 

0.0096 

0.11 

30.3 

Simulation (2,000 missions) 

40 


2.80 

3.33 

205 

0.013 


0.16 

47.7 

Simulation (2,000 missions) 

0 

200 

2.80 

3.30 

200 

0.0042 

0.041 

10.5 

5X Theoretical Limit 


260,110 








(unbiased, Cramer-Rao LB) 

0 

(3,235) 

2.80 

3.30 

200 

0.0042 

0.043 

11.8 

Simulation (2,000 missions) 

40 


2.80 

3.31 

201 

0.0059 

0.063 

20.3 

Simulation (2,000 missions) 

0 

300 

2.80 

3.30 

300 

0.0088 

0.13 

50.0 

IX Theoretical Limit 


52,116 








(unbiased, Cramer-Rao LB) 

0 

(312) 

2.80 

3.34 

310 

0.0087 

0.18 

71.0 

Simulation (3,000 missions) 










Simulation (3,000 missions) 

40 

Further Study 










Required 








_ 

0 

300 

2.80 

3.30 

300 

0.0039 

0.060 

22.3 

5X Theoretical Limit 


260,580 








(unbiased, Cramer-Rao LB) 

0 

(1,560) 

2.80 

3.31 

302 

0.0040 

0.067 

26.3 

Simulation (3,000 missions) 

40 


2.80 

3.31 

304 

0.0053 

0.092 

39.8 

Simulation (3,000 missions) 

0 

300 

2.80 

3.30 

300 

0.0035 

0.053 

19.7 

6.4X Theoretical Limit 


333,542 








(unbiased, Cramer-Rao LB) 

0 

(2,000) 

2.80 

3.30 

300 

0.0036 

0.056 

22.4 

Simulation (1,500 missions) 

40 


2.80 

3.31 

301 

0.0046 

0.078 

35.3 

Simulation (1,500 missions) 



As noted in table 4, the case where £ A =300 TeV and the detector’s energy resolution is 40 percent 
resulted in several errant estimates of (Xj and E k , which is perhaps an indication that a simple power law 
would provide an adequate explanation of these particular simulated “missions.’ - However, as indicated in 
table 4, these favorable statistical properties are largely restored when the collecting power is increased by 
a factor of 5 and reinforces the importance of collecting power. Furthermore, no errant estimates were 
observed. Figure 1 8 shows the effect of collecting power on the histograms of the estimate of the knee 
location when £' A =200 TeV and compares the baseline (outer curve) with a 2X (middle) and 5X (inner) 
detector. 


Frequency Histogram of ML Estimate ol Knee Location E k 



TeV 


Figure 18. Effect of collecting power on histogram of knee location estimates. 


It should be noted that the Cramer-Rao bound was derived for the ideal detector having zero energy 
resolution and shows those values of the knee location E k where one begins to see an erosion of the asymp- 
totic properties of ML estimates and the difficulties encountered with the multiple minima of the objective 
function. Attempts to derive the Cramer-Rao bound for a “real” detector having a nonzero resolution and 
involve the convolution integral in eq. (39) were found to be mathematically intractable. However, they 
can readily be numerically constructed using record-order difference equations. 

Also of interest is the correlation between the ML estimates of the three spectral parameters, a 
direct consequence of the mathematical definition of the broken power law in which the knee E k acts as a 
“hinge,” connecting the lower part of the distribution controlled by ot| with the upper part controlled by o^. 
Thus, one can easily visualize a correlation between cq and E k and and E k , while cc^ and appear to be 
only slightly correlated according to the simulation results. 
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For example, when a ( =2.8, «2=3.2, E [= 1 25 TeV, £|=20 TeV, ^2=5, 500 TeV, and the detector 
resolution is zero, the correlation matrix given in table 5 is based on 25,000 simulated missions. When the 
detector resolution is 40 percent and a Gaussian response function used, the correlation was seen to be 
slightly greater among the estimates of the three spectral parameters. 


Table 5. Correlation matrix based on 25,000 simulated missions. 



Correlation Matrix 


a i 

«2 

Ek 

«1 

1.00 

0.08 

0.42 


0.08 

1.00 

0.72 

E k 

0.42 

0.72 

1.00 


4.1.1 Spectral Break Size of 0.3 

The case where is set to 3.1 in the simulations and the so-called spectral break size is reduced to 
0.3 when a, remains fixed at 2.8 is of particular interest. Figure 19 shows relative frequency histograms of 
three estimates (cq o^, E ^ based on 1 ,000 simulated missions in which the GCR events were simulated 

from the broken power spectrum with aj=2.8, 0^=3. 1 , and E k = 1 00 TeV over the range 20-5,500 TeV for 
which the average number of events above 20 TeV is 5 1 ,800 and of which 2,500 are above the assumed knee 
location at 100 TeV. The detector is assumed to have a constant 40-percent energy resolution with a Gaussian 
response function. 


Relative Frequency Histogram of a, and a 2 

1,000 Missions, Broken Power Law With oc 1 =2.8, 
a 2 =3.1, £*=100 TeV, and Incident Energy 20-5,500 TeV 

Detector Resolution 40% 



Relative Frequency Histogram of £* 

1,000 Missions, Broken Power Law With a, =2.8, 
a 2 =3.1, £*=100 TeV, and Incident Energy 20-5,500 TeV 

Detector Resolution 40% 



Figure 19. Relative frequency histograms of the maximum likelihood estimates of the three 
spectral parameters cq, o^, E k of the broken power law energy spectrum. 
Detector response function is Gaussian having 40-percent constant energy 
resolution. 
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Also note that the mean and standard deviation of the incident GCR energies are /%=42 TeV and 
o E - 54 TeV, respectively, for this simulation scenario. Comparing to the case where a 2 =3 . 3 and the other 
parameters the same shows an average of 5 1 ,600 events above 20 TeV of which 2,250 are above the 
assumed knee location at 100 TeV and with fJ . E = 41 TeV and o E =46 TeV. Thus, the standard deviation is 
considerably larger for the a 2 =3. 1 case but also has =10 percent more events above the knee E k . 

Figures 20a and 20b compare standard deviations of the ML estimate of a t and a 2 , respectively, for 
the 02=3.1 with 02=3.3 case as a function of detector energy resolution. A somewhat surprising result is 
observed in figure 20b where the standard deviation of the O 2 estimate actually decreases when the spectral 
break size decreases from 0.5 to 0.3 and is attributable to the 10-percent increase in events above the knee, 
despite the increase in GCR incident energy variance (<J E increases as the break size decreases, and hence 
so does the standard deviation of the detector responses o Y which would tend to increase the standard 
deviation of the estimate of o^). Thus, as seen in figure 20b, the increase in events above the knee slightly 
outweighs the increase in variance associated with the decrease in spectral break size. Note in figure 20c 
the standard deviation of the E k estimate almost doubles when the spectral break size decreases from 0.5 to 
0.3, a more intuitive result. 


Standard Deviation of Spectral Parameter a 1 
When a 2 =3.1 and 3.3 

Energy Range 20-5,500 TeV 

— a 2 =3.1 -A- a 2 =3.3 



Figure 20a. Standard deviation of the maximum likelihood 
estimate of a j for the a 2 =3.1 and 0^=33 case 
as a function of detector (assumed Gaussian) 
resolution. 
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Standard Deviation of Spectral Parameter a 2 
When « 2 =3.1 and 3.3 

Energy Range 20-5,500 TeV 



Resolution (%) 


Figure 20b. Standard deviation of the maximum likelihood 
estimate of f° r the 02=3.1 and 02=3.3 case 
as a function of detector (assumed Gaussian) 
resolution. 


Standard Deviation of Spectral Parameter f^Kink Location, TeV) When 
02=3.1 and 3.3 

Energy Range 20-5,500 TeV 



Resolution (%) 


Figure 20c. Standard deviation of the maximum likelihood 
estimate of E k for the 02=3.1 and 02=3.3 case 
as a function of detector (assumed Gaussian) 
resolution. 
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Last, the asymptotic properties and correlation among the estimates is explored by simulating 100,000 
missions from the broken power distribution with a, =2.8, a 2 = 3.1, and £*=100 TeV over the range of 
20-5,500 TeV. This is accomplished using a detector having twice the collecting power of the baseline 
detector and thus providing 103,600 events on average above 20 TeV, of which =5,000 are above the 
assumed knee location at 100 TeV. The ideal or zero-resolution detector is also used for comparison with 
the Cramer-Rao bound which has only been derived for zero-resolution detectors. Table 6 gives the means, 
standard deviations, and Cramer-Rao bound for this scenario and table 7 gives the correlation matrix based 
on these 100,000 simulated missions. 


Table 6. Means, standard deviations, 
and Cramer-Rao bounds. 



Mean 

Standard 

Deviation 

Cramer-Rao 

Bound 

«1 

2.80 

0.0084 

0.0083 

a 2 

3.10 

0.032 

0.031 

E k 

100.5 TeV 

8.6 TeV 

7.6 TeV 


Table 7. Correlation matrix. 



a 1 

a 2 

Ek 

a 1 

1.00 

0.06 

0.47 

a 2 

0.06 

1.00 

0.68 

h 

0.47 

0.68 

1.00 


4.2 Data Analysis Range Study 

The energy range [£,,£ 2 ] from which GCR proton events are simulated has a significant impact on 
the statistical properties of the ML estimates. While increasing E 2 beyond 5,500 TeV has no noticeable 
effect since events of energy exceeding 5,500 TeV are very unlikely, lowering £, does have a significant 
impact on the standard deviation of the estimates of and E k . By lowering £j, many more events repre- 
sentative of that part of the broken power law below the knee and controlled by a x will be detected, along 
with the extension of the estimation range or “moment arm” for a t , the combination thereby providing 
greater precision in the estimation of a,. Furthermore, as a x is estimated with greater precision, E k can be 
measured with somewhat greater precision too since reducing the variation in <x j removes additional varia- 
tion in the “hinge” £*.. Hence, lowering the data analysis range results in a reduction in uncertainty of a, 
and E k and thus reduces the total uncertainty so that very slight gains in variance reduction in the estimate 
of a 2 is also realized. These results are depicted in figures 21a— 21c for a x , o^, and E k , when E x =30, 20, 15, 
and 10 TeV and for which there were on average 24,500, 5 1 ,500, 87,000, and 1 8 1 ,000 events, respectively, 
with =2,250 above the knee for each. Other parameters are aj=2.8, o^=3.3, E*=100 TeV, E 2 =5,500 TeV, 
and the response function is assumed Gaussian. 
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Standard Deviation of Spectral Parameter a, 
for Energy Ranges 30, 20, 15, 10 TeV - 5,500 TeV 


-0- £,=30 TeV -#-£,=15 TeV 

£,=20 TeV — ■ — £, =1 0 TeV 



Figure 2 1 a. Effects of lowering £, on the standard deviation of the estimate of or , . 


Standard Deviation of Spectral Parameter E k (Kink Location, TeV) 
for Energy Ranges 30, 20, 15, 10 TeV -5,500 TeV 

£,=30 TeV -•-£,=15 TeV 

--A-- £,=20 TeV ♦ £,=10 TeV 



Figure 21b. Effects of lowering £, on the standard deviation of the estimate of Ef.. 


Standard Deviation of Spectral Parameter o i 2 
tor Energy Ranges 30, 20, 15, 10 TeV- 5,500 TeV 

-0- £,=30 TeV -#-£,=15 TeV 

— A-- £,=20TeV -#-£,=10TeV 



Figure 21c. Effects of lowering £, on the standard deviation of the estimate of a ' 2 - 


4.3 Energy-Dependent Resolution Study 

The situation in which the detector response function is assumed to be Gaussian but the detector 
energy resolution varies with incident GCR event energy is of particular interest to designers of cosmic-ray 
detectors. In previous studies presented so far in this TP, the detector response function is assumed to be 
Gaussian with a linear mean response (energy deposit) of the form ( a + bE ) and with constant detector 
energy resolution p so that the parameter (Tin the Gaussian response function is defined as a(E)=p(a + bE). 
Two cases of interest are (1) energy resolution is “getting better” from 40-percent resolution at £,=20 TeV 
to 30 percent at £2 = 5,500 TeV and (2) “getting worse” from 30-percent resolution at £’j=20 TeV to 
40 percent at E 2 =5,500 TeV. These two cases are modeled by assuming that a{E) is a linear function of 
incident GCR energy of the form(c + dE) and then the coefficients c and (/ are determined by matching the 
conditions for each of the two cases. Doing so yields the energy-dependent resolution curves depicted in 
figure 22. 

Table 8 shows the results based on 100 simulated missions using the same incident GCR energies 
for both cases and the mean estimates shown are essentially unbiased, with standard deviations having 
expected comparisons; e.g., standard deviations slightly larger for the “getting worse” case. The constant 
40-percent case is included for comparison. 
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Nonconstant Resolution Curves 
for Energies Between 20-5,500 TeV 



10 100 1,000 10,000 

Incident Energy (TeV) 


Figure 22. Energy-dependent resolution curves. 
Table 8. Nonconstant energy resolution results. 


Mean and Standard Deviation of the Estimates Based on 100 Missions 





Resolution 




Constant 40% 

Nonconstant 
(Getting Better) 

Nona 

(Gettinf 

mstant 

Worse) 

Spectral 

Parameter 

Mean 

Standard 

Deviation 

Mean 

Standard 

Deviation 

Mean 

Standard 

Deviation 

a i 

2.80 

0.02 

2.794 

0.018 

2.794 

0.018 

«2 

3.33 

0.072 

3.309 

0.067 

3.312 

0.073 

E k 

100.7 

14.4 

99.63 

12.6 

99.93 

13.5 


4.4 Non-Gaussian Detector Response Functions 


The simulation studies presented so far have assumed a Gaussian detector response function. While 
reference 5 suggests that a Gaussian function is reasonable, there is concern that perhaps the response 
function is skewed slightly to the right and that this “tail will contribute to greater difficulties in estimat- 
ing the broken power law spectral parameters. The gamma response function, capable of describing a wide 
variety of shapes with right-hand skewness (outer curve from the right in fig. 23) and the broken-Gaussian 
consisting of two blended normal distributions (middle curve from right) suggested by reference 8 for its 
closeness to the Gaussian response function but with the tail region, as desired, were introduced to address 
this concern. Both were used as detector response functions in 1 ,000 simulated missions using the baseline 
detector collecting power and simulating GCR events from the broken power law with parameters a, =2.8, 
02=3.3, £ x = 1 00 TeV, from the range 20-5,500 TeV. The results are shown in table 9. Note that the gamma 
response function produces a slight bias in the estimate of the knee location that was removed in a subse- 
quent run with the collecting power doubled. Also note that the standard deviation of the estimate of O 2 
increases by =13 percent for both response models relative to the Gaussian response function having 
40-percent resolution. It should also be noted that while the gamma response function has a constant 
energy resolution of 40 percent, the broken Gaussian has a 41 -percent resolution because of the added 
skewness while keeping the rest of the distribution matching the Gaussian. 
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Detector Response Functions to 40 TeV 
Proton (40% Resolution) 



0 100 200 300 400 


Energy (GeV) 

Figure 23. Gamma, broken Gaussian, and Gaussian response functions. 


Table 9. Gaussian, broken Gaussian, and gamma response function study. 


Mean and Standard Deviation of Maximum Likelihood Estimates of Spectral Parameters (1 ,000 Missions) 


a i 

a 2 

£* 

Response Model 
(40% Resolution) 

Mean 

Standard 

Deviation 

Mean 

Standard 

Deviation 

Mean 

Standard 

Deviation 

Gaussian 

2.80 

0.020 

3.31 

0.072 

100.7 

14.4 

Broken Gaussian 

2.80 

0.021 

3.31 

0.082 

100.8 

14.9 

Gamma 

2.80 

0.023 

3.31 

0.082 

102.3 

16.1 


4.5 Collecting Power Versus Resolution Study 

Cosmic-ray instrument developers must often make trade studies in design parameters as a func- 
tion of the science objectives, which is very important for space-based detectors where physical param- 
eters, such as dimension and weight, impose rigorous practical limits to the design envelope. Particularly 
important is the comparison between detector energy resolution and collecting power (combination of 
detector size and observing time) two parameters often played against each other in the design phase of a 
new detector program. As seen in the simple power law section, the ability to measure the spectral param- 
eter cfj, measured in terms of the standard deviation as its estimator, depends rather weakly on resolution 
and strongly on collecting power as is evidenced in figure 7. Also observed was that the standard deviation 
is inversely proportional to the square root of the number of events, so that halving or doubling the collect- 
ing power scales the standard deviation by a factor of V2 for the ML estimate when the number of events 
exceeds around 2,000. As noted in table 3, the variance of the broken power law distribution (and its higher 
moments too, although not shown in table 3) is somewhat smaller than the variance of the simple power 
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law, implying the detector’s energy resolution will play a somewhat stronger role in the estimation of the 
three spectral parameters. Figures 24a-24c illustrate the relationship between collecting power and detec- 
tor energy resolution by showing the impact on the standard deviation of the three spectral parameters 
when the collecting power of the baseline detector is halved and then doubled. In this study, GCR events 
were simulated from the broken power law with parameters a,= 2.8, 02=3.3, E k =\ 00 TeV, from the energy 
range 20-5,500 TeV, and the baseline number of events is 5 1 ,600 above 20 TeV of which 2,250 are above 
the assumed knee at 1 00 TeV, In approximate terms, note that doubling the collecting power compares with 
about a 20-percent trade in resolution for o, and E k but also note that a 40-percent resolution detector is 
better than a zero-resolution detector of half its size relative for the event-starved a 2 parameter. 


Standard Deviation of Spectral Parameter a., 
tor 0.5X, IX, 2X Collecting Power 

Energy Range 20-5,500 TeV 


0.5X (A/=25,800) 
-Q- IX (A/=51 ,600) 
2X { A/=1 03,200) 



Figure 24a. Relationship between collecting power and energy 

resolution measured in terms of the standard deviation 
of the maximum likelihood estimate of cc { . 
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Standard Deviation of Spectral Parameter a 2 
tor 0.5X, IX, 2X Collecting Power 

Energy Range 20-5,500 TeV 


CSJ 


a 


ra 

03 



0 20 40 50 

Resolution (%) 


Figure 24b. Relationship between collecting power and energy 
resolution measured in terms of the standard 
deviation of the maximum likelihood estimate of o^. 


Standard Deviation of Spectral Parameter E k 
tor 0.5X, IX, 2X Collecting Power 

Energy Range 20-5,500 TeV 

0.5X (A/=25,800) 


1X(A/=51,600) 

2X (/V=1 03,200) 



Figure 24c. Relationship between collecting power and energy 

resolution measured in terms of the standard deviation 
of the maximum likelihood estimate of E k . 


It is important to note that the relationships illustrated in figures 24a-24c are independent of the 
energy range as similar comparisons were observed when £jwas lowered to 15TeV and to lOTeV. Raising 
E 2 has no effect since the number of events above £2=5,500 TeV is negligible for detectors with this 
collecting power. 

Because the Cramer-Rao lower bound always scales by -Jn for each of the three spectral param- 
eters and, as noted in table 4, the asymptotic properties (including attainment of the Cramer-Rao bound) of 
the ML estimates of ct-> and E A are nearly met whenever the number of events above the knee exceeds 
2,500, which is about the situation for the baseline detector collecting power when £ A =100 TeV, it can be 
seen that doubling the collecting power means the standard deviation of the ar) d £* estimators scales by 

V2 , but halving results in a factor of around 1 .5 instead of 1 .4 1 , as attainment of the Cramer-Rao bound is 
slipping away faster for the smaller detector. Obviously, as E k increases to 200 and 300 TeV as in table 4, 
the number of events above the knee diminishes too so that the bound is not attained, so scaling will not go 
by the -Jn until the collecting power is such that the number of events above £ A is =2,500 or more. This 
latter result is the rationale for selecting the hypothetical 5X detector in table 4 so that the number of events 
above £ A =200 is 3,235. Of course since the number of events representative of a, is always quite large and 
is on the order of 50,000 or greater when the lower limit of the data analysis range is 20 TeV or less for the 
baseline detector, scaling by -J~N will hold for the standard deviation of the ML estimate of CT|. 

4.6 Implications of Detector Response Model Uncertainties 

Maximum likelihood estimation of cosmic-ray spectral parameters as presented in this TP requires 
the complete specificity of all detector response model parameters. The reality of actually knowing these 
parameters with little or no surrounding uncertainty depends largely on designers being able to calibrate 
the detector at different incident energies at a particle accelerator facility. However, because space-based 
detectors will be exposed to GCR events having energy much greater than those energies available at 
accelerator facilities, it becomes essential to gain an understanding of the detector’s response function 
using Monte Carlo simulations of the detector’s response (energy deposit) to those energies that cannot be 
attained at accelerator facilities. These simulations, coupled with a favorable comparison between simula- 
tion results and accelerator results at energies available in a test facility, will provide a better understanding 
of the detector response function. 

By way of example, the impacts on spectral parameter estimation when certain detector response 
function parameters are incorrectly known are investigated next. This state of ignorance will manifest 
itself as a bias in the mean or point estimate of the spectral parameters. This situation is modeled by 
simulating detector responses according to one set of detector response function parameters and then using 
a different set of parameters in the detector response function g in eq. (40) of the ML estimation procedure . 

Since detector resolution is an important design parameter, the case is first considered where the 
detector has a constant energy resolution; however, a different resolution value was used in an assumed 
state of misunderstanding in eq. (40). For example, suppose the real detector resolution is a constant 
35 percent, but in the simplex search the resolution parameter (p) is set to different constant values in 
eq. (40) corresponding to resolutions ranging from 31 to 39 percent. This situation is modeled by simulat- 
ing the detector responses 7, as 
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Yj = (a + bEj)( 1 + 0.35 Z ( ) 


(41) 


according to eq. (18) and for GCR event energy £)• from an assumed broken power law with parameters 
a, =2.8, 02=3.3, and £ A =100 TeV, from the energy range 20-5,500 TeV and for an assumed Gaussian 
response model having 35-percent energy resolution. Z, is a Gaussian random number having zero mean 
and unit variance, along with the nonnegativity constaint Yj >0. Next, in the ML procedure, p is set to the 
different values in eq. (40) to obtain the ML estimates of the three spectral parameters. Table 10 at the end 
of this section shows the mean for each of ML spectral parameter estimates based on 100 simulated mis- 
sions, each where p is set to 0.3 1 , 0.32, . . . , 0.39 in eq. (40). 


Table 10. Implications of detector response model uncertainties. 



Assumed 

Resolution 

«1 

«2 

F*(TeV) 

Constant resolution versus 

31% 

2.76 

3.29 

96.6 

assumed constant 35% 

32% 

2.76 

3.29 

96.8 


33% 

2.77 

3.30 

97.4 


34% 

2.79 

3.30 

98.2 


35% 

2.80 

3.30 

99.4 


36% 

2.81 

3.31 

101 


37% 

2.83 

3.31 

103 


38% 

2.84 

3.32 

106 


39% 

2.86 

3.32 

109 

Nonconstant resolution versus 

Getting worse: 

2.88 

3.34 

145 

assumed constant 35% 

30% to 40% over 





20-5,500 TeV 





Getting better: 

2.65 

3.25 

67 


40% to 30% over 





20-5,500 TeV 




Gaussian versus assumed 

Method 1 

2.98 

3.38 

171 

Broken Gaussian 





Broken Gaussian versus 
assumed Gaussian 

Method 1 

2.53 

3.21 

67 

Broken Gaussian versus 
assumed Gaussian 

Method 2 

2.85 

3.32 

115 


Note that the mean estimates exhibit a bias as a result of using incorrect values of p in eq. (40). Also 
see in table 10 that when p=0.35 in eq. (40) and matches the “correct” resolution as used in eq. (41) to 
simulate the detector responses, the means of the ML estimates match the assumed spectral parameters 
used in the simulation, and thus there is no bias in the estimates. It was also noted that their variances were 
essentially unaffected and this example is akin to a misaligned riflescope that results in the rifle shooting 
off-axis from the line of sight but the shot group size remains unaffected. 
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Next, consider the situation where the real detector resolution is energy dependent, but a constant 
resolution of 35 percent is used in eq. (40). For example, if the real detector resolution is “getting better” 
over the simulated GCR energy range 20—5,500 TeV as shown in figure 22 but instead a constant p= 0.35 is 
used in eq. (40) in the simplex search for 0 ML and 100 simulated missions, very large biases result (given 
in table 10). Another case where the real resolution was “getting worse,” depicted in figure 22, was when 
a constant of 35-percent resolution was again used in eq. (40), resulting in the other large biases given in 
table 10. Based on these studies, one concludes that the real key is to understand what the true energy- 
resolution relationship is and not so much a matter that it has a particular mathematical form. However, as 
these studies indicate, designs having a constant resolution are more forgiving as long as the error amount 
is a constant. 

Another important study regards the so-called tails of the response function. The response func- 
tions depicted in figure 23 were used to address this concern. The results from these simulations are pre- 
sented in table 9 and indicate that while a “smaller tail” is desirable, having a larger tail is not as bad as 
perhaps feared. Of particular interest is the situation in which the real detector response function is Gaussian 
but in a state of ignorance, the broken Gaussian function is inserted as the detector response function g in 
eq. (40) in the ML search for 0 . Based on 1 ,000 mission averages, a large bias in the mean estimates of the 
spectral parameters is noted in table 10. In the case where the real detector response function is the broken 
Gaussian function, the Gaussian function was incorrectly used in eq. (40) and is also included in table 10, 
and again large biases are seen. These two cases are labeled as method I and will be compared to a revised 
technique labeled method 2. 

It should be noted that in method 1, as well as in all simulation studies presented so far in this TP, 
GCR events are simulated from an energy range E { to E 2 , where typically E 2 =5,500 TeV and E, is a value 
between 10 and 25 TeV. The choice of E 2 is based on the collecting power of the detector and is chosen 
such that there will only be a negligible number of events above E 2 . The selection of E, is largely dictated 
by the practical number of events that can be handled in the simulation and for a thousand or more mis- 
sions. Setting £, to =20 TeV proved to be a good working value since 50,000 events on average are 
generated for the baseline-sized detector that are representative of «, and hence provides a robust estimate 
of oc | for the unconstrained multistage approach of estimating the three spectral parameters; i.e., first 
fitting a,, then keeping a ( fixed at this value and fitting a 2 and E k , followed by the three-dimensional 
search for (a,, o^, £^) ML on the full set of energy deposits. The adequacy of this working value of Ej=20 
TeV is further reinforced by noting in figure 21c that the critical parameter oc 2 is essentially independent of 
lowering E| below 20 TeV when the knee location is 100 TeV or greater. 

Next, for each of these simulated GCR events, a detector response is simulated according to the 
assumed detector response function and then the full set of simulated responses are used to estimate the 
spectral parameters. However, because no energies below E x are simulated, frequency histograms of the 
simulated detector responses, which resemble the appropriate detector pdf shown in figure 1 2, do not 
match the front-end portion of a real cosmic-ray energy spectrum which does look like those depicted in 
figure 8. This difference or mismatch is an artifact of not generating events from below E x that would have 
otherwise had the effect of filling in this front-end portion of the histogram and consequently resembling a 
real cosmic-ray energy spectrum. 
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This difference is not critical when making relative comparisons of the effects of design parameters 
or energy spectrum parameters when detector response function parameters used to generated the simu- 
lated responses match those detector response function parameters used in eq. (40) in the simplex search 
for 0 ML ; i.e., implies a perfect understanding of the response function. However, when the impacts of 
response function uncertainties are studied, it is more important that the simulation techniques produce 
results that are closer to a real cosmic-ray energy response spectrum. To illustrate this point, suppose E ] is 
set to 5 TeV in the simulation and the baseline detector collecting power is used, along with a broken power 
law energy spectrum with parameters a^= 2.8, a 2 =3. 3, and E k = 1 00 TeV, and E 2 =5,5 00 TeV so that there 
will be around 634,000 events above 5 TeV. Next, if detector responses assuming a Gaussian response 
function with a constant 40-percent energy resolution are simulated, then there will be 477,400 responses 
on average <50 GeV, whereas there will be 459,400 responses <50 GeV if the broken-Gaussian response 
function depicted in figure 23 is used, or a difference of 1 8,000 events. This region of energy deposits <50 
GeV results in that portion of the histograms that are of the greatest mismatch between the Gaussian and 
broken-Gaussian detector response histograms and is an artifact of not having any events <5 TeV in the 
simulation, and it is also the same region that does not match a real cosmic-ray response spectrum. Thus, it 
is this large mismatch that is driving the large biases seen in table 10 for a i and E k when responses accord- 
ing to one of these response functions are simulated and then the other response function is used in eq. (40) 
of the simplex search for 0 ML to study the impact of incorrectly understanding the “tail” of the detector 
response function. 

The goal of method 2 is to make the histogram of the simulated detector responses match a real 
cosmic-ray energy spectrum when studying the effects of incorrectly known detector response function 
parameters so that a better estimate of their impact on the spectral parameter estimates is gained. This is 
achieved by placing a cut y c in the simulated detector responses and then dropping all responses <y c . In the 
simulation, the choice of y c dictates the value of £) because fj must be chosen so that the probability of 
events having energy <E l but producing detector responses >y c is negligible, which obviously depends on 
the detector’s energy resolution. For example, if y c .=60 GeV and a Gaussian response function having a 40- 
percent energy resolution and a mean response ( a + bE) is considered, as used for the baseline detector 
and defined in eq. (18), then £) can be any value <7 TeV, since only a negligible number of events from 
below 7 TeV will deposit more than 60 GeV. Selecting E ( =5 TeV provides =634,000 GCR events and 
setting y ( .=60 GeV and dropping all simulated detector responses smaller than y c produces a simulated 
response spectrum that does indeed look like a real response spectrum. Estimating the spectral parameters 
using only the simulated detector responses that are >y c as described here and for the case where the real 
detector response function is the broken Gaussian but a Gaussian function is inserted in eq. (40) in the 
simplex search for 0 ML which results in the much more modest and intuitive biases shown as method 2 in 
the last row of table 10. Varying the cut y c between 60 and 100 GeV produced similar results for all three 
spectral parameters, while lowering y c below 55 GeV resulted in the more severe bias obtained using 
method 1 and associated with the large front-end mismatch of the histograms. 

A very important practical benefit realized by introducing the cut y ( . is that the lower limit of inte- 
gration in eq. (40) can be any value £^<£ ] , which means that the ML procedure can be made independent 
of the range of integration, as long as E L is chosen wisely. Thus, the ML estimation procedure herein 
developed can now be applied to real cosmic-ray detector response data. It should be mentioned that cuts 
on the high end are not required, since any value E H >E 2 is suitable because the probability of events >E 2 
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are essentially zero. However, setting E h unnecessarily high would result in many unnecessary calcula- 
tions in the numerical integration of eq. (40). 

Introducing the cut y ( . requires a modification to the objective function in eq. (40) to handle the 
conditional detector response distribution. Thus, the objective function for method 2 becomes 


0(a x ,a 2 ,E k )=-\ogL=- X lo gki (>7 \yj > y c ;a x ,a 2 ,E k )] , 

j = 1 


where 


g\(yj\yj>y c -,cc\,a 2 ,E k ) = 



g(yj I £;p) 0 i ( E ;a x ,a 2 ,E k )dE 

_ 

1- Jsi (>’ ;a x ,a 2 ,E k )dy 

o 


>v 


■H- 


(43) 


From a simulation point of view, E,=5 TeV is about the lowest value that was used because of the 
vast number of generated events and the requirement to handle thousands of simulated missions which are 
needed to make meaningful inferences. Consequently, cuts much less than 60 GeV are generally not fea- 
sible in simulations designed to study detector response function uncertainties. However, cuts in real 
cosmic-ray data can be taken to be much lower since the spectrum is already filled in from events having 
energies much less than 5 TeV. 
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5. CONCLUSIONS 


Methods for estimating cosmic-ray spectral parameters from simulated detector responses with 
implications for detector design are presented in this TP. The method of ML estimation is seen to be the 
method of choice for estimating the single spectral parameter ot| of a simple power law spectrum in terms 
of minimum variance and other important statistical properties and was thus selected as the estimation 
procedure for the broken power law spectrum. Again, the ML estimates attained these favorable statistical 
properties when the true knee location was around 100 TeV, but then these properties gradually slipped 
away for knee locations of 200 TeV and greater. The case of a spectral break size of 0.3 was also investi- 
gated and the results compared with the 0.5 break-size case in figures 20a-20c. A data analysis range study 
was conducted and showed that significant improvements in the precision in estimating the slope a, below 
the knee and the location E k (but to a lesser degree) can be realized by lowering the lower limit of the 
simulation range £, but had essentially no impact on the estimation of the slope parameter a 2 above the 
knee. 


The effects of detector energy resolution, collecting power, as well as various functional forms for 
the detector response function and energy-dependent resolution functions have also been studied and these 
results presented in this TP. While the energy resolution observed plays a somewhat stronger role in the 
estimation of the spectral parameters of a broken power law energy spectrum relative to a simple power 
law, the ability to estimate these spectral parameters, measured in terms of their standard deviations, still 
depends rather weakly on resolution and strongly on collecting power. 

While increasing the size of the right-hand tail of the detector response function did indeed cause a 
slight rise in the standard deviation of the estimates of the three spectral parameters (greatest for c^), the 
ML estimation procedure yielded estimates that, from a practical point of view, are unbiased. Similar 
results were gleaned from the studies using energy-dependent resolution functions. The implications of 
detector response model uncertainties were also investigated and the magnitude of such induced biases for 
various uncertainties presented. Cuts in the detector response data were introduced to simulate a more 
realistic cosmic-ray response spectrum and thereby provide a better description of the induced biases in the 
spectral parameter estimates when detector parameters are incorrectly known. Introduction of these cuts 
yielded the additional benefit of freeing the integral used in the ML procedure of requiring unique integra- 
tion limits, thereby making this ML estimation procedure applicable to real cosmic-ray data. 
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APPENDIX A— SALIENT RESULTS AND THEIR APPLICATION 
TO DESIGN OF SPACE-BASED COSMIC-RAY DETECTORS 


A number of the salient results from this research and their application to the design of space- 
based cosmic ray deterctors are presented in appendix A. 


NASA/MSFC 

Spectral De-Convolution 

Estimating Cosmic Ray Spectral Parameters from Simulated Detector Responses 
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• Study the statistical behavior of (a 1? E k ) ML over many missions 

• Vary calorimeter size, resolution, response function, and spectral parameters, etc. 

asymptotically minimum variance, consistent (unbiased for large samples), and normal. Kendall & Stuart, Advanced Theory of StatK 




leters: 


















Relative Frequency Histograms of ((Xj , E k ) ML for 1000 Missions 

Events were simulated from^ with a u =2.8, a 2 =3.3, E k = 100 TeV over the range 
20 TeV < Ej < 5,500 TeV, for which N Average = 51,600 (2,250 > 100 TeV) 
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Spectral Parameter al and a2 Energy (TeV) 




Standard Error of the ML Estimates (a r a 2 , E k ) ML vs Detector Resolution 

Events simulated fromc|) , with (^=2.8, a 2 =3.3, E k =100 TeV, 1000 Missions 

Gaussian Response Function 
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Component due to Event Statistics 

(independent of detector response model) 





Data Analysis Range Study: Lowering 
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NASA/MSFC 



Calorimeter Size and Resolution Study: 20 < E; <5,500 TeV 

(J)j with 0Cj =2.8, 0*2 =3.3, E k = 100 TeV, Gaussian Response Function 1000 Missions 




(A»ll >13 iP "-us P*P u*}$ 
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NASA/MSFC 





Calorimeter Size and Resolution Study: 20 < E i <5,500 TeV 

(J)j with a 1 =2.8, a 2 =3.3, = 100 TeV, Gaussian Response Function 1000 Missions 



*3 P JOJJ3 pnpueis 
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NASA/MSFC 



Estimation of the Three Spectral Parameters: 0% and 40% Resolution, 04 =3.1 

Events from^ with a, =2.8, a 2 =3.1, = 100 TeV 20 TeV < Ej < 5,500 TeV, 

N A verage = 51,800 (-2,500 > 100 TeV) 
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NASAIMSFC 





Relative Frequency Histograms of (oc^ o^, E k ) ML for 1000 Missions when a 2 =3.1 

Events were simulated from with a, =2.8, a 2 =3.1, E k = 100 TeV over the range 
20 TeV < Ej < 5,500 TeV, for which N Average = 51,800 (2,500 > 100 TeV) 



58 


2,9 Special Paraft^er al a.2 3.3 




m 

rn 

k 

TS 

a 

vi 


> 

<D 

H 


O 

rn S 
ll ^ 

cT 1 ? 

g S 


<D 

W) 
CJ 
>-9 ed 


£ 

^ >. 
w EP 

^ <D 

<(3 

= 5 

© 2 

u II 

m 


-a 

u 


w 


A oo 

■g CM 

§ || 

c« d 

.9 •'g 

is * 

a 

£ 

© 

u 


Jr ro 
to ■ 
Q. « 
— P 
TO C 

£ «J 
g t- 

a 7? 
w 1 
8 
O 

L. C 

o ® 

m f 

p 


p 

c 

TO 


CM 

TO 

? 

TO 

a: 


2 

LLi 



J,ft p jojjg pjspuc^s 


d) 

x 


> 

d> 

H 

o 

o 


d 

o3 

C 

o 


1-H o3 

A ' H 

A > 

O d> 
O T3 

^T3 

7 £3 

wT3 

o § 
o 
oc 


c/3 


*-■ c 

in d 

11 I 

* & 

. Uh 

m 2 
d 
<D 


d> 

c3 

"e" 

o 


o 

c/3 

d> 

t-4 

d> * 

as 

C/3 , i 

d> |i 

£ “w 

o o. 

•4— > r. 

o 

d> 


o 

C/3 


T3 

d 

d 


d 2 


o> 

C/3 

Ctf 

d> 

O 

fi 

d> 


N® 

ON 


> 
<u 
-§H 

£ II 

4J W 

o D 

d 


> 

o> 

H 
o 
o 

O O' 
in <N 
CN ^ 

7? 

< w 

© °- 
o 

VO 


Si ^ 
o X 

?d t - 


C/3 

CD 
O 

w ^ d> 

d a ^ 

a 

*-» . 
d d> 

X) <*L S5 

. CV cd 




in 


m 


> 

d> 

H 

VO 

Tf 

II 


d) 

o 

d 

X 


o 
s: K 
.5 a 


a> 

> 

o 


II 

cm 

8 


.« ;z 

~ -O 

w A *3 

_£) ^3 O 
«* 13 £ 
<D Ss r-N 

'o 


<u 

cn 

d) 

Vh 

CJ 

d 

d> 

nS 

C/3 

X 

W) 

• 

d> 

d 

o 

4-J ' 

X 

W) 


<D 

<U 


d 

cu 


c ^ 
d) T 3 

ll 


O 

C/3 

d> 


> 

<u 

r^! 

II 

> 

i 

u 

Vm 

^+-1 

C3 

H 

a 

H 

* 

T3 

> 

(N 

M- 

d 

d> 

T— H 

d" 

no 

o x 

D 

C/3 

II 

|| 

O 

<D 

w 


w 


O 

=1 

!> 

=L 

i 



d> 
> 
o 

X 

cd 

d 

d> 
> 
<D 


<2 

CD 

i-i 

CU 


<u 

o 


tn 

03 

> 



o 

in 


o 


o 

CO 


o 

CM 


cT 



59 


Resolution 





2-D Histogram of (c^ , a 2 ) ML 

Events from <h with a h =2.8, 04 =3.2, E k = 125 TeV over the range 
20 TeV<Ei< 5,500 TeV, for which N Average 51,900 (1,600 >125 TeV) - baseline detector 

Gaussian Response Function, 40% Resolution - 25,000 Simulated Missions 
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NASA/MSFC 




2-D Histogram of (ctj , ) ML - Zero Resolution 

Events from with =2.8,a 2 =3.2, E k = 125 TeV over the range 

20 TeV <E- < 5,500TeV, for which N Average 51,900 (1,600 > 125TeV) - baseline siz> 

25,000 Simulated Missions 
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NASA/MSFC 



2-D Histogram of Standardized ((Xj , E k )m L and (a 2 , E k ) ML . Zero Resolution 

Events from (J)j with a, =2.8, cc 2 =3.2, E k = 125 TeV over the range 
20 TeV < Ej < 5,500 TeV, for which N Average 51,900 (1,600 > 125TeV) - baseline size 
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NASA/MSFC 




2-D Histogramof Standardized^.,, E k ) ML and (a 2 , E k ) ML .Zero Resolution 
Events frorfi, witha., =2.8, a 2 =3.1,E k = lOOTeV over the range 
20TeV < Ej< 5,500 TeV, for which N Average =103,600 (5,000>100TeV) -2X baseline size 
________ 100,000 Missions 
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Asymptotic Properties of the Maximum Likelihood Estimates as Knee Location Varies 

a,, =2.8, a 2 =3.3, E k = 100, 200, 300** TeV, 

Data Analysis Range 20 - 5,500TeV Baseline Detector 
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Rule of thumb: need about 2000 events or more abov£ k to achieve CR Bound 

*Bound below which the variance of an estimator cannot fall. 

Kendall and Stewart, Advanced Theory of Statistics 




Asymptotic Properties of the Maximum Likelihood Estimates 

as Knee Location Varies 

oc„ =2.8, a 2 =3.3, E k = 100, 200, 300 TeV, 
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Study of Kink Location and Asymptotic Properties of the Maximum Likelihood Estimates 

a, =2.8,062 =3.3, ^ = 100, 200, 300 TeV, Data Analysis Range 20 - 5,500 TeV 
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• A few spurious results occurred for this case, and a simple power law might have provided a 
better fit to the “data” thus suggesting the need for hypothesis testing 

NASA/MSFC 



Calorimeter Size and Resolution Study 
Histogram of ML Estimates of Knee Location for Baseline, 2X, 5X Detector 

Events from <()j with oc j =2.8, ot 2 =3.3, = 200 TeV, 20 < Ej <5,500 TeV , 1000 Missions 
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NASA/MSFC 




Calorimeter Size and Resolution Study 
Histogram of ML Estimates of Knee Location for Baseline, 5X, and KLEM 

Events from (]), with a, =2.8, a 2 =3.3, = 300 TeV,20 < E s <5,500 TeV , 1000 Missions 

GaussianResponse Function, 40% Resolution for Baseline * and 5X Detector, 
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data” thus suggesting the need for hypothesis testing. 





How Well do We Need to Know our Detector Response Function? 
Uncertainties in Gaussian Response Model Parameters 

What if the true resolution is 35%, but we think it is .... 
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Uncertainties in Gaussian Response Model Parameters 

True Resolution is Non-Constant, but we assume it is a constant(35%) 
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Gaussian (40%) vs “Broken” Gaussian - Method 1 

(Jjj with a, =2.8, =3.3, E k = 100 TeV, 20 < E i <5,500 TeV, 1000 Missions 
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Uncertainties in Detector Response Models 
Gaussian (40%) vs “Broken” Gaussian 

<J), with a, =2.8, a 2 =3.3, = 100 TeV, 20 < Ej <5,500 TeV, 25,000 Missions 
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Case 2: Assumed Gaussian but really “Broken” Gaussian. ^=2.53, a 2 =3.1, E k =66 TeV 



Uncertainties in Detector Response Models 
Assumed Gaussian (40%) but Really “Broken” Gaussian - Method 1 

(J), with a, =2.8, a 2 =3.3,E k = 100 TeV, 20 < Ej <5,500 TeV 
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NASA/MSFC 



Uncertainties in Detector Response Models: Log-Log Plot of Previous Chart 
Assumed Gaussian (40%) but Really “Broken” Gaussian - Method 2 

4>j with =2.8, a 2 =3.3,E k = 100 TeV, 5 <Ej <5,500 TeV (634,083 events) 
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NASA/MSFC 





Uncertainties in Detector Response Models 
Assumed Gaussian (40%) but Really “Broken” Gaussian- Method 2 

<(), with cc, =2.8,02 =3.3, E k = 100 TeV, 5 <Ej <5,500 TeV (634,083 events) 
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NASA/MSFC 



Uncertainties in Detector Response Models 
Assumed Gaussian (40%) but Really “Broken” Gaussian - Method 2 

<(> j with a, =2.8, (X2 =3.3,E k = 100 TeV, 5 <Ej <5,500 TeV (634,083 events) 

Method 2- Make the response spectrum look more like a “real” spectrum by placing a cut in the 
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T his makes it applicable to “real” data sets 

Bonus 2: Extends to multiple data sets/detectors with likelihood functionsL k , so that 
0(a!,a 2 ,E k )= - log L k since L = c:L k 



Glossary of Terms and Abbreviations 
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KLEM is a Russian concept for an energy-measuring device that uses the number and angular distribution of 
all secondaries from the first interaction of the cosmic ray to estimate energy. It gives somewhat poorer energy 
resolution but is very light weight so can be much bigger than a calorimeter. 



Broken Power Law Probability Density Function 





<N 

s' 


3<s}ueA3 ;o JsqiunN 


79 


m + l-aj [£ k I m + ] T a 2 








o> 

<T$ 



h- 

T“ 


sr 

o N 

h- 

o 


O 

Tf 

tn 


CD 

O 

p 



d 




LO 

00 



h- 

CM 


sr 

CM 

CO 


o 

CM 

CM 


LO 

O 

O 

CL 


d 

t- 

c 








o 






T"“ 

LO 



CM 

CM 

o 

0 s 

CD 

CD 


O 

O 

O 

0> 

Tt 

O 

o 

a: 

C 

3 


d 

T— 




CO 


CO 

CO 

c 

o 


3 

o 

w 

O 

O 

o 

O 

CO 

O 

p 



d 




N- 



Vp 

o 



o 

LU 

T— 


CM 

CD 




c\i 



nP 


1 


o 

o 



T — 


1 

■ 

1 

“O > 
<D £ 

16 !a 


1 

1 

O 03 
C -Q 

P" 

1 

1 

2 2 
1- CL 



















Application to “Real” Data:use conditional distribution g(yly>y c ) 

Range of Incident Proton Energies E (TeV) 

Not simulated I Simulated Range I Less than one event Im 2 l3-yr 
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Estimating Spectral Parameters from Simulated Detector Responses 
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• Study the statistical behavior of (Xj,a 2 , E k ) ML over these many missions 

• Vary calorimeter size, resolution, response function, and spectral parameters, etc. 

asymptotically minimum variance, consistent (unbiased for large samples), and normal. Kendall & Stuar t&dvanced Theory of Statistics 
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Resolution 







Detector Response Function Study: 

Gaussian, Gamma, “Broken”Gaussian 

<J>! with a, =2.8,02 =3.3, ^ = 200 TeV, 20 <E; <5,500 TeV, 1000 Missions 
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ML estimate of oCj (p=0.40) and Mean Incident Energy 
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Detector Resolution 



Checking the Maximum Likelihood Estimates a 2 , E k ) 


— i 

s 



Objective Function (EJ 



Objective function evaluated along Objective function evaluated at random 

orthogonal lines through (a, a 2 , ^ ) ML points in the vicinity of (CX] a 2 , ) ML , 

then put in ascending order. 





Objective Function in the Neighborhood of(aj, a 2 , E^) 
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Objective Function in the Neighborhood of(0Cj a 2 , E k ) ML 

(with Oj fixed) 

Events from ((), with a, =2.8,a 2 =3.3,E k = 100 TeV over the range 
20 TeV < Ej < 5,500 TeV, for which N Average 51,600 (2,250 > lOOTeV) rr- baseline detector 
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Objective Function in the Neighborhood of(aj a 2 , E|<)mL 

(with a x fixed) 

Events from^j with a, =2.8,a 2 =3.3,E k = 200 TeV over the range 
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Objective Function in the Neighborhood of(aj a 2 , E k ^ , (with fixed) 

Events from (]), with =2.8, a 2 =3.3,E k = 200 TeV over the range 

20 TeV < Ej < 5,500 TeV, for which N Average 52,000 (650 > 200 TeV) - baseline detector 
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Objective Function in the Neighborhood ofCoij a 2 , ) ML 

(with affixed) 

Events from (J), with (Xj =2.8, a 2 =3.3,E k = 300 TeV over the range 
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Objective Function in the Neighborhood of(a r , E k ) ML 

(with fixed) 

Events from <{), with a! =2.8,a 2 =3.3,E k = 300 TeV over the range 
20 TeV<E i <5,500TeV, for which N Average 52,100 (300 > 300 TeV) - baseline detector 


Objective 

Function 
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Objective Function in the Neighborhood of(a 1? a 2 , E k ) ML 

(with a 2 fixed) 

Events from ^ with a,, =2.8,a 2 =3. 3,1^ = 300 TeV over the range 
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Detector Response Surface: Gaussian 

Energy Range 20 - 100 TeV, 40%-resolution 





Detector Response Surface: Gamma 

Energy Range 20-100 TeV, 40%-resolution 



Incident Energy (TeV) 






Detector Response Surface: Broken Gaussian 

Energy Range 20 - 100 TeV, 40%-resolution 
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Estimation of the Three Spectral Indices^ a 2 , E k forlOOO Missions 

Events were simulated from(J) 1 with dj =2.8, =3.1, E k = 100 TeV 



Estimation of the Three Spectral Indices a 2 , E k forlOOO Missions 

Events were simulated from(j)j with dj =2.8, a 2 =3.1, = 100 TeV 




Shows Cramer-Rao Lower bound for the standard deviation of any estimator of the knee 
location Ek and for a2 (slope above knee) as a function of true knee location for ideal detector 
with p=0. a, is fixed at 2.8,a 2 =3.1 and 3.3, 100 < ^ < 600, and data analysis range 20-5,500 
TeV. 
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100 300 500 100 300 500 

Knlee Location (TeV) Knee Location (TeV) 






Shows where (a 2 - 2 a a2 , CR ) cross with (a! + 2 a aI , CR ) as a function of knee location for 
ideal detector with p=0 (and a half-size detector). This isNO substitute for hypothesis 
testing, but might suggest “a trouble zone” even for the zero resolution detectoiAND using 
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Knee Location Ek (TeV) Knee Location Ek < TeV ) 







Calorimeter Size and Resolution Study: 20 < Ej <5,500 TeV 

=2.8, a 2 =3.3, E k = 100 TeV, Gaussian Response Function 1000 Missions 
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Recall that RMS! Mean is 51% whena/ji = 0.6 for Gaussian response function 




Detector Response Probability Density Function for 
Resolutions 10%, 20%, 40%, and 60% 

Broken Power Law,a-i=2.8, ai=3.3, E k =100TeV 
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Y (GeV) 









Maximum Likelihood and Method of Moments 
Estimator of Spectral Prameter a, (simple 

power law) versus Detector Resolution 
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